You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/05 15:44:29 UTC

[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866056903


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit
+ * at runtime. All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime, For example:
+   * 1. If: common subexpressions will always be evaluated at the beginning, but the true and
+   *        false expressions in `If` may not get accessed, according to the predicate
+   *        expression. We should only return the predicate expression.
+   * 2. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
+   *              condition. We should only return the first condition expression as it
+   *              will always get accessed.
+   * 3. Coalesce: it's also a conditional expression, we should only return the first child,
+   *              because others may not get accessed.
+   * 4. NaNvl: we can only guarantee the left child can be always accessed.
+   *           And if we hit the left child, the right will not be accessed.
+   */
+  def head: Expression = children.head

Review Comment:
   Since we are doing abstraction here, let's make it more general. I think there are 2 things we need to care about for a condition expression:
   1. inputs that will always be evaluated. Today all conditional expressions only have one input that will always be evaluated, but I don't see why it can't be a `Seq`. How about `def alwaysEvaluatedInputs: Seq[Expression]`
   2. groups of branches. For each group, at least one branch will be hit at runtime, so that we can eagerly evaluate the common expressions for a group. How about `def branchGroups: Seq[Seq[Expression]]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org