You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/13 15:04:51 UTC

[GitHub] [arrow-rs] ovr opened a new pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

ovr opened a new pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042


   Hello!
   
   # Which issue does this PR close?
   
   I found that DF returns true regexp `%(%)%`
   
   ```
   > SELECT 'int' LIKE '%(%)%', 'int(255)' LIKE '%(%)%';
   +--------------------------------+-------------------------------------+
   | Utf8("int") Like Utf8("%(%)%") | Utf8("int(255)") Like Utf8("%(%)%") |
   +--------------------------------+-------------------------------------+
   | true                           | true                                |
   +--------------------------------+-------------------------------------+
   1 row in set. Query took 0.010 seconds.
   ```
   
   But it should return false for the first case. 
   
   Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#issuecomment-998084945


   Filed https://github.com/apache/arrow-rs/issues/1069 to track additional characters (other than `(` and `)`).
   
   Thanks @ovr  and @jwdeitch 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#discussion_r768178551



##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -360,7 +360,11 @@ pub fn like_utf8_scalar<OffsetSize: StringOffsetSizeTrait>(
             }
         }
     } else {
-        let re_pattern = right.replace("%", ".*").replace("_", ".");
+        let re_pattern = right
+            .replace("%", ".*")
+            .replace("_", ".")
+            .replace("(", r#"\("#)

Review comment:
       Thinking about his more, I wonder if other special regexp characters need to be replaced too (e.g. `.` and `+`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#discussion_r768118546



##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -2228,6 +2240,13 @@ mod tests {
         vec![true, true, true, false, false, true, false]
     );
 
+    test_utf8_scalar!(
+        test_utf8_array_like_scalar_escape_testing,
+        vec!["varchar(255)", "int(255)", "varchar", "int"],
+        "%(%)%",
+        like_utf8_scalar,
+        vec![true, true, false, false]

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#discussion_r768595313



##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -360,7 +360,11 @@ pub fn like_utf8_scalar<OffsetSize: StringOffsetSizeTrait>(
             }
         }
     } else {
-        let re_pattern = right.replace("%", ".*").replace("_", ".");
+        let re_pattern = right
+            .replace("%", ".*")
+            .replace("_", ".")
+            .replace("(", r#"\("#)

Review comment:
       that is a great call @jwdeitch  -- so something like 
   
   ```rust
   let re_pattern = escape(right)
     .replace('%', '.*')
     .replace('_', '.')?
   ```
   
   Perhaps




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] seddonm1 commented on a change in pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
seddonm1 commented on a change in pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#discussion_r769072767



##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -360,7 +360,11 @@ pub fn like_utf8_scalar<OffsetSize: StringOffsetSizeTrait>(
             }
         }
     } else {
-        let re_pattern = right.replace("%", ".*").replace("_", ".");
+        let re_pattern = right
+            .replace("%", ".*")
+            .replace("_", ".")
+            .replace("(", r#"\("#)

Review comment:
       I found the rust regex crate syntax difficult at the time so I would suggest adding test cases for all their escaping. Regex is so powerful it would be easy to miss something.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#discussion_r768179758



##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -360,7 +360,11 @@ pub fn like_utf8_scalar<OffsetSize: StringOffsetSizeTrait>(
             }
         }
     } else {
-        let re_pattern = right.replace("%", ".*").replace("_", ".");
+        let re_pattern = right
+            .replace("%", ".*")
+            .replace("_", ".")
+            .replace("(", r#"\("#)

Review comment:
       the list of special characters is here: https://docs.rs/regex/1.5.4/regex/#syntax




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#issuecomment-992585074


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#1042](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e6e3244) into [master](https://codecov.io/gh/apache/arrow-rs/commit/7e44ca8f3c02a1f7f65fd85138fa7ed743538f15?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7e44ca8) will **increase** coverage by `0.00%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/1042/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #1042   +/-   ##
   =======================================
     Coverage   82.31%   82.31%           
   =======================================
     Files         168      168           
     Lines       49031    49037    +6     
   =======================================
   + Hits        40359    40365    +6     
     Misses       8672     8672           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow-rs/pull/1042/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb21wYXJpc29uLnJz) | `93.27% <100.00%> (+0.05%)` | :arrow_up: |
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/1042/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `85.10% <0.00%> (-0.14%)` | :arrow_down: |
   | [arrow/src/datatypes/datatype.rs](https://codecov.io/gh/apache/arrow-rs/pull/1042/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy9kYXRhdHlwZS5ycw==) | `66.38% <0.00%> (+0.42%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [7e44ca8...e6e3244](https://codecov.io/gh/apache/arrow-rs/pull/1042?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jwdeitch commented on a change in pull request #1042: fix(compute): LIKE/ILIKE/NLIKE escape parenthesis

Posted by GitBox <gi...@apache.org>.
jwdeitch commented on a change in pull request #1042:
URL: https://github.com/apache/arrow-rs/pull/1042#discussion_r768188149



##########
File path: arrow/src/compute/kernels/comparison.rs
##########
@@ -360,7 +360,11 @@ pub fn like_utf8_scalar<OffsetSize: StringOffsetSizeTrait>(
             }
         }
     } else {
-        let re_pattern = right.replace("%", ".*").replace("_", ".");
+        let re_pattern = right
+            .replace("%", ".*")
+            .replace("_", ".")
+            .replace("(", r#"\("#)

Review comment:
       good call. maybe we should pass `right` through this, then add the wildcards 
   https://docs.rs/regex/0.2.2/regex/fn.escape.html
   
   (I'm happy to help test this @ovr )




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org