You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/01 11:03:28 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5819: feat: Add expression rewrite rules for LIKE and ILIKE

alamb commented on code in PR #5819:
URL: https://github.com/apache/arrow-datafusion/pull/5819#discussion_r1155093800


##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -1118,6 +1119,22 @@ impl<'a, S: SimplifyInfo> TreeNodeRewriter for Simplifier<'a, S> {
                 right,
             }) => simplify_regex_expr(left, op, right)?,
 
+            // Rules for Like
+            Expr::Like(Like {
+                expr,
+                pattern,
+                negated,
+                escape_char: _,
+            }) if !is_null(&expr) && pattern.eq(&Box::new(lit("%"))) => lit(!negated),

Review Comment:
   By writing `Box::new(...)` I think this is goign to do two allocations each time (one for the Box and one for the String in the literal)
   
   Would it be possible to rewrite this to a match statement without allocation?
   
   ```rust
   matches!(pattern.as_ref(), Expression::Literal(ScalarValue::Utf8(..)))
   ```
   ?
   
   You might have to define some better way to check for the literal scalar



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -2320,16 +2337,10 @@ mod tests {
         assert_no_change(regex_match(col("c1"), lit("f_o")));
 
         // empty cases
-        assert_change(regex_match(col("c1"), lit("")), like(col("c1"), "%"));

Review Comment:
   🤔 the original rewrite rule looks incorrect to me (as in the orignal rule shouldn't be rewriting regexp match). We can fix this in some other PR
   
   ```sql
   postgres=# select regexp_match('foo', '');
    regexp_match
   --------------
    {""}
   (1 row)
   
   postgres=# select like('foo', '%');
    like
   ------
    t
   (1 row)
   ```



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -2851,4 +2838,70 @@ mod tests {
             or(col("c2").lt(lit(3)), col("c2").gt(lit(4)))
         );
     }
+
+    #[test]
+    fn test_like_and_ilke() {
+        // test non-null values
+        let expr = Expr::Like(Like::new(
+            false,
+            Box::new(col("c1")),
+            Box::new(lit("%")),
+            None,
+        ));

Review Comment:
   What do you think about using the expr_fn's style for creating `like` https://docs.rs/datafusion/latest/datafusion/prelude/enum.Expr.html#method.like
   
   ```suggestion
           let expr = like(col("c1"), lit("%"));
   ```
   
   I think that would make it clearer what cases these tests were covering



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org