You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/27 02:13:46 UTC

[GitHub] [arrow] ZMZ91 opened a new pull request #11016: Bugfix/limit return chars count

ZMZ91 opened a new pull request #11016:
URL: https://github.com/apache/arrow/pull/11016


   - add max/min return length for space/lpad/rpad udfs
   - correct return length 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r743021359



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1851,8 +1881,10 @@ const char* rpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
   } else {
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string right
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(
+        text_len, text_char_count, return_length, fill_text, fill_text_len);
+    char* ret = reinterpret_cast<gdv_binary>(
+        gdv_fn_context_arena_malloc(context, return_char_length));

Review comment:
       same comment as above. Shouldn't the while loop in 1899 use return_char_length instead of return length?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r753091928



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       Can you check with @projjal ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-977075318


   @ZMZ91 try to rebase with master again. The error in the workflow was fixed, I tested with a PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ZMZ91 commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ZMZ91 commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r749944769



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       What should I do next? I have no permission to merge this pr, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pravindra closed pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
pravindra closed pull request #11016:
URL: https://github.com/apache/arrow/pull/11016


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-979128899


   Benchmark runs are scheduled for baseline = 8aa1ead601ea28ba6774c928a1348e65005de4f1 and contender = 3346fa65a0785b19f1283f57fa093f459c894708. 3346fa65a0785b19f1283f57fa093f459c894708 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b3e2e3f6db624bd6bd2c39c9df5f3c58...95192370ee4d4fce80aae94eb7bb91ad/)
   [Failed] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c9c36d41e64b46ffb735fc0b9f773a10...18670afcad7d4af79db289e6bb2f9e57/)
   [Finished :arrow_down:0.09% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4a6058dbbc8e4aac92c85a06e65739f8...137228c0bf0f4950896afc9a9bf00156/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ZMZ91 commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ZMZ91 commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r718106328



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       Thank you @anthonylouisbsb! BTW, could you help check my another pr https://github.com/apache/arrow/pull/10884? That one was opened even much earlier than this one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r743031788



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1851,8 +1881,10 @@ const char* rpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
   } else {
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string right
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(
+        text_len, text_char_count, return_length, fill_text, fill_text_len);
+    char* ret = reinterpret_cast<gdv_binary>(
+        gdv_fn_context_arena_malloc(context, return_char_length));

Review comment:
       same comment as above. Shouldn't the while loop in 1899 use return_char_length instead of return length?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-978155285


   @ZMZ91 I will ask for @pravindra to merge it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r714798261



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       @projjal can you give us help with that problem? I do not know why the build is still failing even with the tip I gave to @ZMZ91 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-979128899


   Benchmark runs are scheduled for baseline = 8aa1ead601ea28ba6774c928a1348e65005de4f1 and contender = 3346fa65a0785b19f1283f57fa093f459c894708. 3346fa65a0785b19f1283f57fa093f459c894708 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b3e2e3f6db624bd6bd2c39c9df5f3c58...95192370ee4d4fce80aae94eb7bb91ad/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c9c36d41e64b46ffb735fc0b9f773a10...18670afcad7d4af79db289e6bb2f9e57/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4a6058dbbc8e4aac92c85a06e65739f8...137228c0bf0f4950896afc9a9bf00156/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-979128899


   Benchmark runs are scheduled for baseline = 8aa1ead601ea28ba6774c928a1348e65005de4f1 and contender = 3346fa65a0785b19f1283f57fa093f459c894708. 3346fa65a0785b19f1283f57fa093f459c894708 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b3e2e3f6db624bd6bd2c39c9df5f3c58...95192370ee4d4fce80aae94eb7bb91ad/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c9c36d41e64b46ffb735fc0b9f773a10...18670afcad7d4af79db289e6bb2f9e57/)
   [Finished :arrow_down:0.09% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4a6058dbbc8e4aac92c85a06e65739f8...137228c0bf0f4950896afc9a9bf00156/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r743021359



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?
   
   Never mind, the text_char_count was confusing. I think the text_char_count is actually text_len - invalid utf8..maybe rename 'text_char_count' to 'actual_text_len'




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ZMZ91 commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ZMZ91 commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-977366435


   > @ZMZ91 try to rebase with master again. The error in the workflow was fixed, I tested with a PR.
   
   Thank you! It worked! Could you help merge it then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r753092593



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       ALso you may have to rebase and make the failing checks pass




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r710193573



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       I think the error is appearing in the AppVeyor build is because you are using a value that is lower than the minimum integer.
   
   Try to change to `-2639077559L`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ZMZ91 commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ZMZ91 commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r745261537



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       Changed to actual_text_len. @rkavanap if no more issues, could you help merge this pr since it's been quite a while since open. Thanks a lot.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r743021359



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1851,8 +1881,10 @@ const char* rpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
   } else {
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string right
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(
+        text_len, text_char_count, return_length, fill_text, fill_text_len);
+    char* ret = reinterpret_cast<gdv_binary>(
+        gdv_fn_context_arena_malloc(context, return_char_length));

Review comment:
       same comment as above. Shouldn't the while loop in 1899 use return_char_length instead of return length?

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?
   
   Never mind, the text_char_count was confusing. I think the text_char_count is actually text_len - invalid utf8..maybe rename 'text_char_count' to 'actual_text_len'

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1851,8 +1881,10 @@ const char* rpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
   } else {
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string right
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(
+        text_len, text_char_count, return_length, fill_text, fill_text_len);
+    char* ret = reinterpret_cast<gdv_binary>(
+        gdv_fn_context_arena_malloc(context, return_char_length));

Review comment:
       same comment as above. Shouldn't the while loop in 1899 use return_char_length instead of return length?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r743021359



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       return_char_length seems to be used only for allocation. Shouldn't the while loop in 1832 also use return_char_length?
   
   Never mind, the text_char_count was confusing. I think the text_char_count is actually text_len - invalid utf8..maybe rename 'text_char_count' to 'actual_text_len'

##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1851,8 +1881,10 @@ const char* rpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
   } else {
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string right
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(
+        text_len, text_char_count, return_length, fill_text, fill_text_len);
+    char* ret = reinterpret_cast<gdv_binary>(
+        gdv_fn_context_arena_malloc(context, return_char_length));

Review comment:
       same comment as above. Shouldn't the while loop in 1899 use return_char_length instead of return length?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r717699631



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       @ZMZ91 I think Projjal's suggestion worked! I looked at the failed tests and it seems that was caused by environment issues when it tries to install dependencies.
   
   Do the rebase with Arrow's master repository and push it to execute the builds again!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r717699631



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       @ZMZ91 I think Projjal's suggestion worked! I looked at the failed tests and it seems that was caused by environment issues when it tries to install dependencies.
   
   Do the rebase with Arrow's master repository and push it to execute the builds again!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11016: Bugfix/limit return chars count

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-906872398


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW
   
   Opening JIRAs ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-906873730


   https://issues.apache.org/jira/browse/ARROW-13780


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ZMZ91 commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ZMZ91 commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r714418401



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       Seems still not working. But thank you anyway. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] projjal commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
projjal commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r714945847



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       Try `-2639077559LL`. long long is guaranteed to be atleast 64 bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-979128899


   Benchmark runs are scheduled for baseline = 8aa1ead601ea28ba6774c928a1348e65005de4f1 and contender = 3346fa65a0785b19f1283f57fa093f459c894708. 3346fa65a0785b19f1283f57fa093f459c894708 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b3e2e3f6db624bd6bd2c39c9df5f3c58...95192370ee4d4fce80aae94eb7bb91ad/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c9c36d41e64b46ffb735fc0b9f773a10...18670afcad7d4af79db289e6bb2f9e57/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4a6058dbbc8e4aac92c85a06e65739f8...137228c0bf0f4950896afc9a9bf00156/)
   Supported benchmarks:
   ursa-i9-9960x: langs = Python, R, JavaScript
   ursa-thinkcentre-m75q: langs = C++, Java
   ec2-t3-xlarge-us-east-2: cloud = True
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-976210957


   @ZMZ91 I think the error in the falling workflow is not related to your changes in this PR. We need to wait for the fix in the master branch and then rebase your branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#issuecomment-976007998


   @ZMZ91 rebase your branch with master and I think the failing builds you be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ZMZ91 commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
ZMZ91 commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r718106328



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -92,6 +96,12 @@ TEST(TestStringOps, TestSpace) {
   EXPECT_EQ(std::string(out, out_len), "    ");
   out = space_int64(ctx_ptr, -5, &out_len);
   EXPECT_EQ(std::string(out, out_len), "");
+  out = space_int64(ctx_ptr, 65536, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, 9223372036854775807, &out_len);
+  EXPECT_EQ(std::string(out, out_len), std::string(65536, ' '));
+  out = space_int64(ctx_ptr, -2639077559, &out_len);

Review comment:
       Thank you @anthonylouisbsb! BTW, could you help check my another pr https://github.com/apache/arrow/pull/10884? That one was opened even much earlier than this one.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rkavanap commented on a change in pull request #11016: ARROW-13780: [Gandiva][UDF] Fix bug in udf space/rpad/lpad

Posted by GitBox <gi...@apache.org>.
rkavanap commented on a change in pull request #11016:
URL: https://github.com/apache/arrow/pull/11016#discussion_r746256866



##########
File path: cpp/src/gandiva/precompiled/string_ops.cc
##########
@@ -1790,8 +1816,10 @@ const char* lpad_utf8_int32_utf8(gdv_int64 context, const char* text, gdv_int32
     // case (return_length > text_char_count)
     // case where it needs to copy "fill_text" on the string left. The total number
     // of chars to copy is given by (return_length -  text_char_count)
-    char* ret =
-        reinterpret_cast<gdv_binary>(gdv_fn_context_arena_malloc(context, return_length));
+    gdv_int32 return_char_length = evaluate_return_char_length(

Review comment:
       @ZMZ91 I am done with my review and this PR looks good. Please let me know what else I need to do to help merge the PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org