You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "crepererum (via GitHub)" <gi...@apache.org> on 2023/05/17 08:17:02 UTC

[GitHub] [arrow-datafusion] crepererum commented on a diff in pull request #6369: feat: add pattern for simplifying exprs like `str ~ '^foo$'`

crepererum commented on code in PR #6369:
URL: https://github.com/apache/arrow-datafusion/pull/6369#discussion_r1196117383


##########
datafusion/optimizer/src/simplify_expressions/regex.rs:
##########
@@ -130,6 +139,46 @@ fn is_safe_for_like(c: char) -> bool {
     (c != '%') && (c != '_')
 }
 
+/// returns true if the elements in a `Concat` pattern are:
+/// - `[Look::Start, Look::End]`
+/// - `[Look::Start, Literal(_), Look::End]`
+fn is_anchored_literal(v: &[Hir]) -> bool {
+    match v.len() {
+        2..=3 => (),
+        _ => return false,
+    };
+
+    let first_last = (
+        v.first().expect("length checked"),
+        v.last().expect("length checked"),
+    );
+    if !matches!(first_last,
+    (s, e) if s.kind() == &HirKind::Look(Look::Start)
+        && e.kind() == &HirKind::Look(Look::End)
+         )
+    {
+        return false;
+    }
+
+    v.iter()
+        .skip(1)
+        .take(v.len() - 2)
+        .all(|h| matches!(h.kind(), HirKind::Literal(_)))
+}
+
+/// extracts a string literal expression assuming that `is_anchored_literal()`

Review Comment:
   ```suggestion
   /// extracts a string literal expression assuming that [`is_anchored_literal`]
   ```



##########
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##########
@@ -2434,6 +2434,21 @@ mod tests {
         // single word
         assert_change(regex_match(col("c1"), lit("foo")), like(col("c1"), "%foo%"));
 
+        // regular expressions that match an exact literal
+        assert_change(regex_match(col("c1"), lit("^$")), col("c1").eq(lit("")));
+        assert_change(
+            regex_not_match(col("c1"), lit("^$")),
+            col("c1").not_eq(lit("")),
+        );
+        assert_change(
+            regex_match(col("c1"), lit("^foo$")),
+            col("c1").eq(lit("foo")),
+        );
+        assert_change(
+            regex_not_match(col("c1"), lit("^foo$")),
+            col("c1").not_eq(lit("foo")),
+        );

Review Comment:
   Could we have another "unsupported" test here for (use `assert_no_change`, see other cases in this test):
   
   - `^foo|bar$` (OR within anchor)
   - `^(foo)(bar)$` (I think this is a concat or more than 3 elements)
   - `^` and `$` (single element)
   - `$^` and `$foo^` (anchors flipped)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org