You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/02 20:46:27 UTC

[GitHub] [iceberg] rdblue opened a new pull request, #4947: API: Add expression equivalence testing

rdblue opened a new pull request, #4947:
URL: https://github.com/apache/iceberg/pull/4947

   This adds `ExpressionUtil.equivalent` and `ExpressionUtil.selectsPartitions` to determine whether an expression will select whole files in a partition spec. This is needed to determine whether a filter is guaranteed to delete only whole files, or whether a row-level plan should be used. It is also needed to implement aggregate pushdown with non-trivial filters because filters must match whole files in order to use file metadata to satisfy a query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889332340


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {
+    return Binder.bind(struct, Expressions.rewriteNot(left))
+        .isEquivalentTo(Binder.bind(struct, Expressions.rewriteNot(right)));
+  }
+
+  /**
+   * Returns whether an expression selects whole partitions for a partition spec.
+   * <p>
+   * For example, ts &lt; '2021-03-09T10:00:00.000' selects whole partitions in an hourly spec, [hours(ts)], but does
+   * not select whole partitions in a daily spec, [days(ts)].
+   *
+   * @param expr an unbound expression
+   * @param spec a partition spec
+   * @return true if the expression will select whole partitions in the given spec
+   */
+  public static boolean selectsPartitions(Expression expr, PartitionSpec spec) {

Review Comment:
   Yes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#issuecomment-1148037875

   @wypoon, I merged a PR that added `BoundTerm.isEquivalentTo` earlier today: https://github.com/apache/iceberg/commit/c9efc4b8ccdaf047f32dc134096f68cb7133c545
   
   @kbendick, can you make sure we're running the checks on PRs correctly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889312374


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {
+    return Binder.bind(struct, Expressions.rewriteNot(left))
+        .isEquivalentTo(Binder.bind(struct, Expressions.rewriteNot(right)));
+  }
+
+  /**
+   * Returns whether an expression selects whole partitions for a partition spec.
+   * <p>
+   * For example, ts &lt; '2021-03-09T10:00:00.000' selects whole partitions in an hourly spec, [hours(ts)], but does
+   * not select whole partitions in a daily spec, [days(ts)].
+   *
+   * @param expr an unbound expression
+   * @param spec a partition spec
+   * @return true if the expression will select whole partitions in the given spec
+   */
+  public static boolean selectsPartitions(Expression expr, PartitionSpec spec) {

Review Comment:
   Am I correct this should allow us to optimize metadata-only deletes by handling obvious cases without the need to plan files with potential matches and invoking evaluators?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889290673


##########
api/src/main/java/org/apache/iceberg/expressions/BoundLiteralPredicate.java:
##########
@@ -76,6 +86,53 @@ public boolean test(T value) {
     }
   }
 
+  @Override
+  @SuppressWarnings("unchecked")
+  public boolean isEquivalentTo(Expression expr) {
+    if (op() == expr.op()) {
+      BoundLiteralPredicate<?> other = (BoundLiteralPredicate<?>) expr;
+      if (term().isEquivalentTo(other.term())) {
+        // because the term is equivalent, the literal must have the same type, T
+        Literal<T> otherLiteral = (Literal<T>) other.literal();
+        Comparator<T> cmp = literal.comparator();
+        return cmp.compare(literal.value(), otherLiteral.value()) == 0;
+      }
+
+    } else if (expr instanceof BoundLiteralPredicate) {
+      BoundLiteralPredicate<?> other = (BoundLiteralPredicate<?>) expr;
+      if (INTEGRAL_TYPES.contains(term().type().typeId()) && term().isEquivalentTo(other.term())) {
+        switch (op()) {
+          case LT:
+            if (other.op() == Operation.LT_EQ) {
+              // < 6 is equivalent to <= 5
+              return toLong(literal()) == toLong(other.literal()) + 1L;
+            }
+            break;
+          case LT_EQ:
+            if (other.op() == Operation.LT) {
+              // <= 5 is equivalent to < 6
+              return toLong(literal()) == toLong(other.literal()) - 1L;
+            }
+            break;
+          case GT:
+            if (other.op() == Operation.GT_EQ) {
+              // > 5 is equivalent to >= 6
+              return toLong(literal()) == toLong(other.literal()) - 1L;
+            }
+            break;
+          case GT_EQ:
+            if (other.op() == Operation.GT) {
+              // >= 5 is equivalent to > 5

Review Comment:
   Fixed.



##########
api/src/main/java/org/apache/iceberg/expressions/BoundTransform.java:
##########
@@ -57,6 +57,18 @@ public Type type() {
     return transform.getResultType(ref.type());
   }
 
+  @Override
+  public boolean isEquivalentTo(BoundTerm<?> other) {
+    if (other instanceof BoundTransform) {
+      BoundTransform<?, ?> bound = (BoundTransform<?, ?>) other;
+      return ref.isEquivalentTo(bound.ref()) && transform.equals(((BoundTransform<?, ?>) other).transform());

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #4947:
URL: https://github.com/apache/iceberg/pull/4947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#issuecomment-1146443103

   Thanks for reviewing, @aokolnychyi!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
wypoon commented on PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#issuecomment-1148040089

   > @wypoon, I merged a PR that added `BoundTerm.isEquivalentTo` earlier today: [c9efc4b](https://github.com/apache/iceberg/commit/c9efc4b8ccdaf047f32dc134096f68cb7133c545)
   > 
   Thanks. I haven't picked up #4965. I'll do so then.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889293145


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {

Review Comment:
   Do we care about case sensitivity in any of the methods?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889286645


##########
api/src/main/java/org/apache/iceberg/expressions/Expression.java:
##########
@@ -124,4 +124,21 @@ public Operation flipLR() {
   default Expression negate() {
     throw new UnsupportedOperationException(String.format("%s cannot be negated", this));
   }
+
+  /**
+   * Returns whether this expression will accept the same values as another.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   * <p>
+   * For best results, rewrite not and bind expressions before calling this method.
+   *
+   * @param other another expression
+   * @return true if the expressions are equivalent
+   */
+  default boolean isEquivalentTo(Expression other) {

Review Comment:
   I typically try to make the method name form a natural sentence in English, like `if (someFilter.isEquivalentTo(other))` reads like "if some filter is equivalent to other". In some cases it makes more sense to use other words, like `has`.
   
   I wasn't sure what to do with `ExpressionUtil.equivalent` since it compares two things, but `areEquivalent(f1, f2, schema)` doesn't feel right. I just left that as `ExpressionUtil.equivalent`. Suggestions are welcome!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r888408348


##########
api/src/main/java/org/apache/iceberg/expressions/UnboundPredicate.java:
##########
@@ -259,7 +259,7 @@ public String toString() {
   @SuppressWarnings("unchecked")
   static <T> Set<T> setOf(Iterable<Literal<T>> literals) {
     Literal<T> lit = Iterables.get(literals, 0);
-    if (lit instanceof Literals.StringLiteral && lit.value() instanceof CharSequence) {
+    if (lit instanceof Literals.StringLiteral) {

Review Comment:
   This is always true because `StringLiteral` wraps a `CharSequence`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889293144


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {

Review Comment:
   Do we care about case sensitivity in any of the methods?



##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {

Review Comment:
   Do we care about case sensitivity in any of the methods?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889346113


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,41 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @param caseSensitive whether to bind expressions using case-sensitive matching
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct, boolean caseSensitive) {
+    return Binder.bind(struct, Expressions.rewriteNot(left), caseSensitive)
+        .isEquivalentTo(Binder.bind(struct, Expressions.rewriteNot(right), caseSensitive));
+  }
+
+  /**
+   * Returns whether an expression selects whole partitions for a partition spec.
+   * <p>
+   * For example, ts &lt; '2021-03-09T10:00:00.000' selects whole partitions in an hourly spec, [hours(ts)], but does
+   * not select whole partitions in a daily spec, [days(ts)].
+   *
+   * @param expr an unbound expression
+   * @param spec a partition spec
+   * @return true if the expression will select whole partitions in the given spec
+   */
+  public static boolean selectsPartitions(Expression expr, PartitionSpec spec, boolean caseSensitive) {
+    return equivalent(
+        Projections.inclusive(spec).project(expr),

Review Comment:
   Fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889336305


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {
+    return Binder.bind(struct, Expressions.rewriteNot(left))
+        .isEquivalentTo(Binder.bind(struct, Expressions.rewriteNot(right)));
+  }
+
+  /**
+   * Returns whether an expression selects whole partitions for a partition spec.
+   * <p>
+   * For example, ts &lt; '2021-03-09T10:00:00.000' selects whole partitions in an hourly spec, [hours(ts)], but does
+   * not select whole partitions in a daily spec, [days(ts)].
+   *
+   * @param expr an unbound expression
+   * @param spec a partition spec
+   * @return true if the expression will select whole partitions in the given spec
+   */
+  public static boolean selectsPartitions(Expression expr, PartitionSpec spec) {

Review Comment:
   Cool, I'll follow up with that once this is merged.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889341928


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,41 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @param caseSensitive whether to bind expressions using case-sensitive matching
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct, boolean caseSensitive) {
+    return Binder.bind(struct, Expressions.rewriteNot(left), caseSensitive)
+        .isEquivalentTo(Binder.bind(struct, Expressions.rewriteNot(right), caseSensitive));
+  }
+
+  /**
+   * Returns whether an expression selects whole partitions for a partition spec.
+   * <p>
+   * For example, ts &lt; '2021-03-09T10:00:00.000' selects whole partitions in an hourly spec, [hours(ts)], but does
+   * not select whole partitions in a daily spec, [days(ts)].
+   *
+   * @param expr an unbound expression
+   * @param spec a partition spec
+   * @return true if the expression will select whole partitions in the given spec
+   */
+  public static boolean selectsPartitions(Expression expr, PartitionSpec spec, boolean caseSensitive) {
+    return equivalent(
+        Projections.inclusive(spec).project(expr),

Review Comment:
   Shall we pass `caseSensitive` to `Projections` calls too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889340903


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {

Review Comment:
   Added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889283508


##########
api/src/main/java/org/apache/iceberg/expressions/Expression.java:
##########
@@ -124,4 +124,21 @@ public Operation flipLR() {
   default Expression negate() {
     throw new UnsupportedOperationException(String.format("%s cannot be negated", this));
   }
+
+  /**
+   * Returns whether this expression will accept the same values as another.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   * <p>
+   * For best results, rewrite not and bind expressions before calling this method.
+   *
+   * @param other another expression
+   * @return true if the expressions are equivalent
+   */
+  default boolean isEquivalentTo(Expression other) {

Review Comment:
   In which cases we use `is` prefix for boolean methods and in which no?
   We are not very consistent in the code right now.



##########
api/src/main/java/org/apache/iceberg/expressions/Expression.java:
##########
@@ -124,4 +124,21 @@ public Operation flipLR() {
   default Expression negate() {
     throw new UnsupportedOperationException(String.format("%s cannot be negated", this));
   }
+
+  /**
+   * Returns whether this expression will accept the same values as another.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   * <p>
+   * For best results, rewrite not and bind expressions before calling this method.
+   *
+   * @param other another expression
+   * @return true if the expressions are equivalent
+   */
+  default boolean isEquivalentTo(Expression other) {

Review Comment:
   In which cases we use `is` prefix for boolean methods and in which no?
   We are not very consistent in the code right now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r888645022


##########
api/src/main/java/org/apache/iceberg/expressions/BoundLiteralPredicate.java:
##########
@@ -76,6 +86,53 @@ public boolean test(T value) {
     }
   }
 
+  @Override
+  @SuppressWarnings("unchecked")
+  public boolean isEquivalentTo(Expression expr) {
+    if (op() == expr.op()) {
+      BoundLiteralPredicate<?> other = (BoundLiteralPredicate<?>) expr;
+      if (term().isEquivalentTo(other.term())) {
+        // because the term is equivalent, the literal must have the same type, T
+        Literal<T> otherLiteral = (Literal<T>) other.literal();
+        Comparator<T> cmp = literal.comparator();
+        return cmp.compare(literal.value(), otherLiteral.value()) == 0;
+      }
+
+    } else if (expr instanceof BoundLiteralPredicate) {
+      BoundLiteralPredicate<?> other = (BoundLiteralPredicate<?>) expr;
+      if (INTEGRAL_TYPES.contains(term().type().typeId()) && term().isEquivalentTo(other.term())) {
+        switch (op()) {
+          case LT:
+            if (other.op() == Operation.LT_EQ) {
+              // < 6 is equivalent to <= 5
+              return toLong(literal()) == toLong(other.literal()) + 1L;
+            }
+            break;
+          case LT_EQ:
+            if (other.op() == Operation.LT) {
+              // <= 5 is equivalent to < 6
+              return toLong(literal()) == toLong(other.literal()) - 1L;
+            }
+            break;
+          case GT:
+            if (other.op() == Operation.GT_EQ) {
+              // > 5 is equivalent to >= 6
+              return toLong(literal()) == toLong(other.literal()) - 1L;
+            }
+            break;
+          case GT_EQ:
+            if (other.op() == Operation.GT) {
+              // >= 5 is equivalent to > 5

Review Comment:
   [minor] should this be > 4
   ```suggestion
                 // >= 5 is equivalent to > 4
   ```



##########
api/src/main/java/org/apache/iceberg/expressions/BoundTransform.java:
##########
@@ -57,6 +57,18 @@ public Type type() {
     return transform.getResultType(ref.type());
   }
 
+  @Override
+  public boolean isEquivalentTo(BoundTerm<?> other) {
+    if (other instanceof BoundTransform) {
+      BoundTransform<?, ?> bound = (BoundTransform<?, ?>) other;
+      return ref.isEquivalentTo(bound.ref()) && transform.equals(((BoundTransform<?, ?>) other).transform());

Review Comment:
   [minor] 
   ```suggestion
         return ref.isEquivalentTo(bound.ref()) && transform.equals(bound.transform());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#discussion_r889332007


##########
api/src/main/java/org/apache/iceberg/expressions/ExpressionUtil.java:
##########
@@ -66,6 +67,40 @@ public static String toSanitizedString(Expression expr) {
     return ExpressionVisitors.visit(expr, StringSanitizer.INSTANCE);
   }
 
+  /**
+   * Returns whether two unbound expressions will accept the same inputs.
+   * <p>
+   * If this returns true, the expressions are guaranteed to return the same evaluation for the same input. However, if
+   * this returns false the expressions may return the same evaluation for the same input. That is, expressions may
+   * be equivalent even if this returns false.
+   *
+   * @param left an unbound expression
+   * @param right an unbound expression
+   * @param struct a struct type for binding
+   * @return true if the expressions are equivalent
+   */
+  public static boolean equivalent(Expression left, Expression right, Types.StructType struct) {

Review Comment:
   Maybe? My thought was that since this is binding both expressions, the important thing is that we use the setting consistently. But you're right that we should probably just pass it in because the expressions may not match the struct.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #4947: API: Add expression equivalence testing

Posted by GitBox <gi...@apache.org>.
wypoon commented on PR #4947:
URL: https://github.com/apache/iceberg/pull/4947#issuecomment-1148015646

   @rdblue after I picked up this commit in master, my local build fails with
   ```
   Execution failed for task ':iceberg-api:revapi'.
   > There were Java public API/ABI breaks reported by revapi:
     
     java.method.addedToInterface: Method was added to an interface.
     
     old: <none>
     new: method boolean org.apache.iceberg.expressions.BoundTerm<T>::isEquivalentTo(org.apache.iceberg.expressions.BoundTerm<?>)
     
     SOURCE: BREAKING, BINARY: NON_BREAKING, SEMANTIC: POTENTIALLY_BREAKING
     
     From old archive: <none>
     From new archive: iceberg-api-0.14.0-SNAPSHOT.jar
     
     If this is an acceptable break that will not harm your users, you can ignore it in future runs like so for:
     
       * Just this break:
           ./gradlew :iceberg-api:revapiAcceptBreak --justification "{why this break is ok}" \
             --code "java.method.addedToInterface" \
             --new "method boolean org.apache.iceberg.expressions.BoundTerm<T>::isEquivalentTo(org.apache.iceberg.expressions.BoundTerm<?>)"
       * All breaks in this project:
           ./gradlew :iceberg-api:revapiAcceptAllBreaks --justification "{why this break is ok}"
       * All breaks in all projects:
           ./gradlew revapiAcceptAllBreaks --justification "{why this break is ok}"
     ----------------------------------------------------------------------------------------------------
   ```
   Do you need to add something to .palantir/revapi.yml?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org