You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/18 10:40:59 UTC

[GitHub] [arrow] jpedroantunes opened a new pull request #10739: ARROW-13372: [C++][Gandiva] Implement FIND_IN_SET Hive function on Gandiva

jpedroantunes opened a new pull request #10739:
URL: https://github.com/apache/arrow/pull/10739


   Implement FIND_IN_SET Hive function on Gandiva
   
   https://cwiki.apache.org/confluence/display/hive/languagemanual+udf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jpedroantunes closed pull request #10739: ARROW-13372: [C++][Gandiva] Implement FIND_IN_SET Hive function on Gandiva

Posted by GitBox <gi...@apache.org>.
jpedroantunes closed pull request #10739:
URL: https://github.com/apache/arrow/pull/10739


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jpedroantunes commented on a change in pull request #10739: ARROW-13372: [C++][Gandiva] Implement FIND_IN_SET Hive function on Gandiva

Posted by GitBox <gi...@apache.org>.
jpedroantunes commented on a change in pull request #10739:
URL: https://github.com/apache/arrow/pull/10739#discussion_r674630494



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -1555,6 +1555,29 @@ TEST(TestStringOps, TestBinaryString) {
   EXPECT_EQ(output, "OM");
 }
 
+TEST(TestStringOps, TestFindInSet) {
+  gandiva::ExecutionContext ctx;
+  uint64_t ctx_ptr = reinterpret_cast<gdv_int64>(&ctx);
+
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 2, "HI,B,C", 6), 1);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 2, ",B,C,HI", 7), 4);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 2, ",B,HI,HI,", 9), 3);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "", 0, ",B,A,A,", 7), 1);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 1, "HI,B,C", 6), 0);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "", 0, "B,C,A,", 6), 4);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "", 0, "B,C,,A,", 6), 3);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "A", 1, "B,A,", 4), 2);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "ha", 2, "hao,mn,hc,ha,hef", 16), 4);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "hef", 3, "hao,mn,hc,ha,hef", 16), 5);

Review comment:
       Good point! Solved




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] anthonylouisbsb commented on a change in pull request #10739: ARROW-13372: [C++][Gandiva] Implement FIND_IN_SET Hive function on Gandiva

Posted by GitBox <gi...@apache.org>.
anthonylouisbsb commented on a change in pull request #10739:
URL: https://github.com/apache/arrow/pull/10739#discussion_r674392860



##########
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##########
@@ -1555,6 +1555,29 @@ TEST(TestStringOps, TestBinaryString) {
   EXPECT_EQ(output, "OM");
 }
 
+TEST(TestStringOps, TestFindInSet) {
+  gandiva::ExecutionContext ctx;
+  uint64_t ctx_ptr = reinterpret_cast<gdv_int64>(&ctx);
+
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 2, "HI,B,C", 6), 1);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 2, ",B,C,HI", 7), 4);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 2, ",B,HI,HI,", 9), 3);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "", 0, ",B,A,A,", 7), 1);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "HI", 1, "HI,B,C", 6), 0);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "", 0, "B,C,A,", 6), 4);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "", 0, "B,C,,A,", 6), 3);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "A", 1, "B,A,", 4), 2);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "ha", 2, "hao,mn,hc,ha,hef", 16), 4);
+  EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "hef", 3, "hao,mn,hc,ha,hef", 16), 5);

Review comment:
       I think it would be good to add a test case in which the string that the method will search contains a comma:
   ```cpp
     EXPECT_EQ(find_in_set_utf8_utf8(ctx_ptr, "hao,", 2, "hao,mn,hc,ha,hef", 16), 4);
   ```

##########
File path: cpp/src/gandiva/tests/projector_test.cc
##########
@@ -779,6 +779,43 @@ TEST_F(TestProjector, TestModZero) {
   EXPECT_ARROW_ARRAY_EQUALS(exp_mod, outputs.at(0));
 }
 
+TEST_F(TestProjector, TestFindInSet) {
+  // schema for input fields
+  auto field0 = field("f0", arrow::utf8());
+  auto field1 = field("f1", arrow::utf8());
+  auto schema = arrow::schema({field0, field1});
+
+  // output fields
+  auto field_find = field("find_in_set", arrow::int32());
+
+  // Build expression
+  auto find_expr =
+      TreeExprBuilder::MakeExpression("find_in_set", {field0, field1}, field_find);
+
+  std::shared_ptr<Projector> projector;
+  auto status = Projector::Make(schema, {find_expr}, TestConfiguration(), &projector);
+  EXPECT_TRUE(status.ok()) << status.message();
+
+  // Create a row-batch with some sample data
+  int num_records = 4;
+  auto array0 = MakeArrowArrayUtf8({"A", "", "Z", "A"}, {true, true, true, true});

Review comment:
       Make a test passing some of the values as null, just to ensure that the response is null too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10739: ARROW-13372: [C++][Gandiva] Implement FIND_IN_SET Hive function on Gandiva

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10739:
URL: https://github.com/apache/arrow/pull/10739#issuecomment-882036013


   https://issues.apache.org/jira/browse/ARROW-13372


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org