You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/03 09:47:34 UTC

[GitHub] [arrow] siddhantrao23 opened a new pull request, #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

siddhantrao23 opened a new pull request, #14579:
URL: https://github.com/apache/arrow/pull/14579

   The current conversion of sql like patterns to pcre doesn't work if the escape character is one of the special characters to pcre. Fix this by not escaping those characters if it the escape character itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14579:
URL: https://github.com/apache/arrow/pull/14579#issuecomment-1301855683

   https://issues.apache.org/jira/browse/ARROW-18235


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14579:
URL: https://github.com/apache/arrow/pull/14579#issuecomment-1301855711

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1015384213


##########
cpp/src/gandiva/regex_util.cc:
##########
@@ -44,6 +45,10 @@ Status RegexUtil::SqlLikePatternToPcre(const std::string& sql_pattern, char esca
 
       cur = sql_pattern.at(idx);
       if (cur == '_' || cur == '%' || cur == escape_char) {
+        if (cur == '\\') {

Review Comment:
   This extra condition was needed to escape the '\\' in the pcre pattern if the escape character itself is a '\\'.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14579:
URL: https://github.com/apache/arrow/pull/14579#issuecomment-1318522985

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/965e516e53cd46beb0ad92444cf1aaca...8694d602766344daa596071aeb18a281/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1019590365


##########
cpp/src/gandiva/regex_util.cc:
##########
@@ -44,6 +45,10 @@ Status RegexUtil::SqlLikePatternToPcre(const std::string& sql_pattern, char esca
 
       cur = sql_pattern.at(idx);
       if (cur == '_' || cur == '%' || cur == escape_char) {
+        if (cur == '\\') {

Review Comment:
   This doesn't look correct to me. You should check `cur == escape_char` instead of `curr == '\\'`
   
   @kou in sql like pattern only _, % and escape_char are special characters. And escape_character should only be followed by '_', '%' or the escape char itself as mentioned in the comment in line 40. so we don't need to check for other pcre special characters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
kou commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1013369863


##########
cpp/src/gandiva/regex_util.cc:
##########
@@ -44,6 +45,10 @@ Status RegexUtil::SqlLikePatternToPcre(const std::string& sql_pattern, char esca
 
       cur = sql_pattern.at(idx);
       if (cur == '_' || cur == '%' || cur == escape_char) {
+        if (cur == '\\') {

Review Comment:
   Is this condition correct? It seems that we need to check whether the `cur` (== `escape_char`) is a special character or not:
   
   ```suggestion
           if (pcre_regex_specials_.find(cur) != pcre_regex_specials_.end()) {
   ```
   
   BTW, can we extract this condition to out of this `for` loop if my suggestion is correct?



##########
cpp/src/gandiva/tests/filter_test.cc:
##########
@@ -414,6 +414,39 @@ TEST_F(TestFilter, TestLike) {
 
   // Validate results
   EXPECT_ARROW_ARRAY_EQUALS(exp, selection_vector->ToArray());
+
+  auto literal_escape_pattern =
+      TreeExprBuilder::MakeStringLiteral("%tu^_security^_freeze%");
+  auto escape_char = TreeExprBuilder::MakeStringLiteral("^");

Review Comment:
   Can we use a PCRE special character that is invalid without suitable escap such as `[` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1019590365


##########
cpp/src/gandiva/regex_util.cc:
##########
@@ -44,6 +45,10 @@ Status RegexUtil::SqlLikePatternToPcre(const std::string& sql_pattern, char esca
 
       cur = sql_pattern.at(idx);
       if (cur == '_' || cur == '%' || cur == escape_char) {
+        if (cur == '\\') {

Review Comment:
   This doesn't look correct to me. You should check `cur == escape_char` instead of `curr == '\\'`
   
   @kou in sql like pattern only \_, % and escape_char are special characters. And escape_character should only be followed by '_', '%' or the escape char itself as mentioned in the comment in line 40. so we don't need to check for other pcre special characters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1015385830


##########
cpp/src/gandiva/tests/filter_test.cc:
##########
@@ -414,6 +414,39 @@ TEST_F(TestFilter, TestLike) {
 
   // Validate results
   EXPECT_ARROW_ARRAY_EQUALS(exp, selection_vector->ToArray());
+
+  auto literal_escape_pattern =
+      TreeExprBuilder::MakeStringLiteral("%tu^_security^_freeze%");
+  auto escape_char = TreeExprBuilder::MakeStringLiteral("^");

Review Comment:
   The escape char used here ('^') is one of the pcre spacial characters. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] siddhantrao23 commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
siddhantrao23 commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1015384213


##########
cpp/src/gandiva/regex_util.cc:
##########
@@ -44,6 +45,10 @@ Status RegexUtil::SqlLikePatternToPcre(const std::string& sql_pattern, char esca
 
       cur = sql_pattern.at(idx);
       if (cur == '_' || cur == '%' || cur == escape_char) {
+        if (cur == '\\') {

Review Comment:
   This extra condition was needed to escape the '\' in the pcre pattern if the escape character itself is a '\'.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14579:
URL: https://github.com/apache/arrow/pull/14579#issuecomment-1318522693

   Benchmark runs are scheduled for baseline = 5f8cc745b1fe11acf4ef3d508efc2ff7558894d6 and contender = e0e7ba824f56460cdb7dd9ffa6779de93b62d121. e0e7ba824f56460cdb7dd9ffa6779de93b62d121 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/154cf040775a437092db4ce6cf0cd778...82bbbc2c0c2a453482b7b3750032c058/)
   [Finished :arrow_down:0.3% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1abbf4d35e3b4765b96354a0b07c5f3b...8b4eb50a34e9428da7745daea60629d5/)
   [Failed :arrow_down:0.27% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/965e516e53cd46beb0ad92444cf1aaca...8694d602766344daa596071aeb18a281/)
   [Finished :arrow_down:0.1% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/0049d3eadaf6482b9d5d72a5026ac989...856c43423ab344a1ae9fc3cbf02d0959/)
   Buildkite builds:
   [Finished] [`e0e7ba82` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1884)
   [Finished] [`e0e7ba82` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1906)
   [Finished] [`e0e7ba82` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1875)
   [Finished] [`e0e7ba82` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1898)
   [Finished] [`5f8cc745` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1883)
   [Finished] [`5f8cc745` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1905)
   [Failed] [`5f8cc745` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1874)
   [Finished] [`5f8cc745` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1897)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1019594151


##########
cpp/src/gandiva/tests/filter_test.cc:
##########
@@ -414,6 +414,39 @@ TEST_F(TestFilter, TestLike) {
 
   // Validate results
   EXPECT_ARROW_ARRAY_EQUALS(exp, selection_vector->ToArray());
+
+  auto literal_escape_pattern =
+      TreeExprBuilder::MakeStringLiteral("%tu^_security^_freeze%");

Review Comment:
   can you use an example with an escape character followed by an escape character. for example "^^" here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] projjal commented on a diff in pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
projjal commented on code in PR #14579:
URL: https://github.com/apache/arrow/pull/14579#discussion_r1019590365


##########
cpp/src/gandiva/regex_util.cc:
##########
@@ -44,6 +45,10 @@ Status RegexUtil::SqlLikePatternToPcre(const std::string& sql_pattern, char esca
 
       cur = sql_pattern.at(idx);
       if (cur == '_' || cur == '%' || cur == escape_char) {
+        if (cur == '\\') {

Review Comment:
   This doesn't look correct to me. You should check `cur == escape_char` instead of `curr == '\\'`
   
   @kou in sql like pattern only _, % and escape_char are special characters. And escape_character should only be followed by '_', '%' or the escape char itself as mentioned in the comment above. so we don't need to check for other pcre special characters.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou merged pull request #14579: ARROW-18235: [C++][Gandiva] Fix the like function implementation for escape chars

Posted by GitBox <gi...@apache.org>.
kou merged PR #14579:
URL: https://github.com/apache/arrow/pull/14579


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org