You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/05 10:38:29 UTC

[GitHub] [spark] ulysses-you opened a new pull request, #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

ulysses-you opened a new pull request, #36455:
URL: https://github.com/apache/spark/pull/36455

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Add `ConditionalExpression` trait.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   For developers, if a custom conditional like expression contains common sub expression then the evaluation order may be changed since Spark will pull out and eval the common sub expressions first during execution.
   
   Add ConditionalExpression trait is friendly for developers.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   no, add a new trait
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Pass existed test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866405428


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit

Review Comment:
   ```suggestion
    * An expression that contains conditional expression branches, so not all branches will be hit
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on PR #36455:
URL: https://github.com/apache/spark/pull/36455#issuecomment-1118495383

   cc @cloud-fan @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait
URL: https://github.com/apache/spark/pull/36455


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866433341


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala:
##########
@@ -127,56 +127,18 @@ class EquivalentExpressions {
 
   // There are some special expressions that we should not recurse into all of its children.
   //   1. CodegenFallback: it's children will not be used to generate code (call eval() instead)
-  //   2. If: common subexpressions will always be evaluated at the beginning, but the true and
-  //          false expressions in `If` may not get accessed, according to the predicate
-  //          expression. We should only recurse into the predicate expression.
-  //   3. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
-  //                condition. We should only recurse into the first condition expression as it
-  //                will always get accessed.
-  //   4. Coalesce: it's also a conditional expression, we should only recurse into the first
-  //                children, because others may not get accessed.
-  //   5. NaNvl: it's a conditional expression, we can only guarantee the left child can be always
-  //             accessed. And if we hit the left child, the right will not be accessed.
+  //   2. ConditionalExpression: use it's specified expression

Review Comment:
   ```suggestion
     //   2. ConditionalExpression: use its children that will always be evaluated.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36455:
URL: https://github.com/apache/spark/pull/36455#issuecomment-1119758119

   +1 for @cloud-fan 's backporting decision.
   
   Also, cc @MaxGekk 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866056903


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit
+ * at runtime. All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime, For example:
+   * 1. If: common subexpressions will always be evaluated at the beginning, but the true and
+   *        false expressions in `If` may not get accessed, according to the predicate
+   *        expression. We should only return the predicate expression.
+   * 2. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
+   *              condition. We should only return the first condition expression as it
+   *              will always get accessed.
+   * 3. Coalesce: it's also a conditional expression, we should only return the first child,
+   *              because others may not get accessed.
+   * 4. NaNvl: we can only guarantee the left child can be always accessed.
+   *           And if we hit the left child, the right will not be accessed.
+   */
+  def head: Expression = children.head

Review Comment:
   Since we are doing abstraction here, let's make it more general. I think there are 2 things we need to care about for a condition expression:
   1. inputs that will always be evaluated. Today all conditional expressions only have one input that will always be evaluated, but I don't see why it can't be a `Seq`. How about `def alwaysEvaluatedInputs: Seq[Expression]`
   2. groups of branches. For each group, at least one branch will be hit at runtime, so that we can eagerly evaluate the common expressions of a group. How about `def branchGroups: Seq[Seq[Expression]]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866429527


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit
+ * at runtime. All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime, For example:
+   * 1. If: common subexpressions will always be evaluated at the beginning, but the true and
+   *        false expressions in `If` may not get accessed, according to the predicate
+   *        expression. We should only return the predicate expression.
+   * 2. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
+   *              condition. We should only return the first condition expression as it
+   *              will always get accessed.
+   * 3. Coalesce: it's also a conditional expression, we should only return the first child,
+   *              because others may not get accessed.
+   * 4. NaNvl: we can only guarantee the left child can be always accessed.
+   *           And if we hit the left child, the right will not be accessed.
+   */
+  def head: Expression = children.head

Review Comment:
   good call, Seq[Expression] is also good to me



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866406148


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit
+ * at runtime. All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime, For example:
+   * 1. If: common subexpressions will always be evaluated at the beginning, but the true and
+   *        false expressions in `If` may not get accessed, according to the predicate
+   *        expression. We should only return the predicate expression.
+   * 2. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
+   *              condition. We should only return the first condition expression as it
+   *              will always get accessed.
+   * 3. Coalesce: it's also a conditional expression, we should only return the first child,
+   *              because others may not get accessed.
+   * 4. NaNvl: we can only guarantee the left child can be always accessed.
+   *           And if we hit the left child, the right will not be accessed.
+   */
+  def head: Expression = children.head

Review Comment:
   +1 for `Seq[Expression]` for 1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866434185


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala:
##########
@@ -48,6 +48,14 @@ case class If(predicate: Expression, trueValue: Expression, falseValue: Expressi
   override def second: Expression = trueValue
   override def third: Expression = falseValue
   override def nullable: Boolean = trueValue.nullable || falseValue.nullable
+  /**
+   * Common subexpressions will always be evaluated at the beginning, but the true and
+   * false expressions in `If` may not get accessed, according to the predicate expression.
+   * We should only return the predicate expression.

Review Comment:
   I think we can simplify the comment now
   ```
   Only the condition expression will always be evaluated.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36455:
URL: https://github.com/apache/spark/pull/36455#issuecomment-1119344173

   thanks, merging to master/3.3! (I'm backporting this small refactor as we will have a bug fix that relies on it)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on PR #36455:
URL: https://github.com/apache/spark/pull/36455#issuecomment-1119412422

   thank you @cloud-fan @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866057861


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit
+ * at runtime. All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime, For example:
+   * 1. If: common subexpressions will always be evaluated at the beginning, but the true and
+   *        false expressions in `If` may not get accessed, according to the predicate
+   *        expression. We should only return the predicate expression.
+   * 2. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
+   *              condition. We should only return the first condition expression as it
+   *              will always get accessed.
+   * 3. Coalesce: it's also a conditional expression, we should only return the first child,
+   *              because others may not get accessed.
+   * 4. NaNvl: we can only guarantee the left child can be always accessed.

Review Comment:
   let's move the doc to individual expressions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866056903


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,33 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains predicate expression branch, so not all branch will be hit
+ * at runtime. All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime, For example:
+   * 1. If: common subexpressions will always be evaluated at the beginning, but the true and
+   *        false expressions in `If` may not get accessed, according to the predicate
+   *        expression. We should only return the predicate expression.
+   * 2. CaseWhen: like `If`, the children of `CaseWhen` only get accessed in a certain
+   *              condition. We should only return the first condition expression as it
+   *              will always get accessed.
+   * 3. Coalesce: it's also a conditional expression, we should only return the first child,
+   *              because others may not get accessed.
+   * 4. NaNvl: we can only guarantee the left child can be always accessed.
+   *           And if we hit the left child, the right will not be accessed.
+   */
+  def head: Expression = children.head

Review Comment:
   Since we are doing abstraction here, let's make it more general. I think there are 2 things we need to care about for a condition expression:
   1. inputs that will always be evaluated. Today all conditional expressions only have one input that will always be evaluated, but I don't see why it can't be a `Seq`. How about `def alwaysEvaluatedInputs: Seq[Expression]`
   2. groups of branches. For each group, at least one branch will be hit at runtime, so that we can eagerly evaluate the common expressions for a group. How about `def branchGroups: Seq[Seq[Expression]]`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866435684


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala:
##########
@@ -179,6 +187,36 @@ case class CaseWhen(
     }
   }
 
+  /**
+   * Like `If`, the children of `CaseWhen` only get accessed in a certain condition.
+   * We should only return the first condition expression as it will always get accessed.
+   */
+  override def alwaysEvaluatedInputs: Seq[Expression] = children.head :: Nil
+
+  override def branchGroups: Seq[Seq[Expression]] = {
+    // We look at subexpressions in conditions and values of `CaseWhen` separately. It is
+    // because a subexpression in conditions will be run no matter which condition is matched
+    // if it is shared among conditions, but it doesn't need to be shared in values. Similarly,
+    // a subexpression among values doesn't need to be in conditions because no matter which
+    // condition is true, it will be evaluated.
+    val conditions = if (branches.length > 1) {
+      branches.map(_._1)
+    } else {
+      // If there is only one branch, the first condition is already covered by
+      // `head` and we should exclude it here.

Review Comment:
   ```suggestion
         // `alwaysEvaluatedInputs` and we should exclude it here.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866435782


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala:
##########
@@ -66,6 +67,19 @@ case class Coalesce(children: Seq[Expression]) extends ComplexTypeMergingExpress
     }
   }
 
+  /**
+   * We should only return the first child, because others may not get accessed.
+   */
+  override def alwaysEvaluatedInputs: Seq[Expression] = children.head :: Nil
+
+  override def branchGroups: Seq[Seq[Expression]] = if (children.length > 1) {
+    // If there is only one child, the first child is already covered by
+    // `head` and we should exclude it here.

Review Comment:
   ```suggestion
       // `alwaysEvaluatedInputs` and we should exclude it here.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #36455:
URL: https://github.com/apache/spark/pull/36455#discussion_r866433732


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala:
##########
@@ -454,6 +454,23 @@ trait Nondeterministic extends Expression {
   protected def evalInternal(input: InternalRow): Any
 }
 
+/**
+ * An expression that contains conditional expression branches, so not all branches will be hit.
+ * All optimization should be careful with the evaluation order.
+ */
+trait ConditionalExpression extends Expression {
+  /**
+   * Return the expression which can always be hit at runtime.

Review Comment:
   ```suggestion
      * Return the children expressions which can always be hit at runtime.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org