You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/01 08:35:24 UTC

[GitHub] [spark] LuciferYang opened a new pull request #35694: [SPARK-38360][SQL] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

LuciferYang opened a new pull request #35694:
URL: https://github.com/apache/spark/pull/35694


   ### What changes were proposed in this pull request?
   There are many duplicate code patterns in Spark code:
   
   ```scala
   treeNode.find(condition).isDefined 
   treeNode.find(condition).nonEmpty 
   treeNode.find(condition).isEmpty
   ```
   
   This pr Introduce a `exists` function for `TreeNode` to simplify them, after this pr:
   
   - `treeNode.find(condition).isDefined ` and `treeNode.find(condition).nonEmpty ` -> `treeNode.exists(condition)`
   - `treeNode.find(condition).isEmpty` -> `!treeNode.exists(condition)`
   
   
   ### Why are the changes needed?
   Code simplification
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Pass GA and add a new UT for new function.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822278258



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
##########
@@ -246,6 +246,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Tre
     children.foldLeft(Option.empty[BaseType]) { (l, r) => l.orElse(r.find(f)) }
   }
 
+  /**
+   * Test whether there is [[TreeNode]] satisfies the conditions specified in `f`.
+   * The condition is recursively applied to this node and all of its children (pre-order).
+   */
+  def exists(f: BaseType => Boolean): Boolean = if (f(this)) {

Review comment:
       Like the following?
   
   ```scala
   def exists(pf: PartialFunction[BaseType, Boolean]): Boolean = {
       if (pf.applyOrElse(this, { _: Any => false})) {
         true
       } else {
         children.exists(c => pf.applyOrElse(c, { _: Any => false}))
       }
     }
   ```
   
   This implementation is more like `PartialFunction#cond` as follows:
   
   ```scala
    /** Creates a Boolean test based on a value and a partial function.
      *  It behaves like a 'match' statement with an implied 'case _ => false'
      *  following the supplied cases.
      *
      *  @param  x   the value to test
      *  @param  pf  the partial function
      *  @return true, iff `x` is in the domain of `pf` and `pf(x) == true`.
      */
     def cond[T](x: T)(pf: PartialFunction[T, Boolean]): Boolean = pf.applyOrElse(x, constFalse)
   ```
   
   This requires the assumption that `case _ => false`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r823422170



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1690,11 +1690,10 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe
    */
   private def canPushThroughCondition(plan: LogicalPlan, condition: Expression): Boolean = {
     val attributes = plan.outputSet
-    val matched = condition.find {
-      case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
-      case _ => false
+    condition.exists {

Review comment:
       this one looks a bit confusing. Does it really match the previous logic?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822356000



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
##########
@@ -246,6 +246,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Tre
     children.foldLeft(Option.empty[BaseType]) { (l, r) => l.orElse(r.find(f)) }
   }
 
+  /**
+   * Test whether there is [[TreeNode]] satisfies the conditions specified in `f`.
+   * The condition is recursively applied to this node and all of its children (pre-order).
+   */
+  def exists(f: BaseType => Boolean): Boolean = if (f(this)) {

Review comment:
       ~maybe a little simpler like this:~
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822368550



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -509,7 +509,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
 
     private def hasUnresolvedAlias(exprs: Seq[NamedExpression]) =
-      exprs.exists(_.find(_.isInstanceOf[UnresolvedAlias]).isDefined)
+      exprs.exists(_.exists(_.isInstanceOf[UnresolvedAlias]))

Review comment:
       OK




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r823454421



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1690,11 +1690,10 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe
    */
   private def canPushThroughCondition(plan: LogicalPlan, condition: Expression): Boolean = {
     val attributes = plan.outputSet
-    val matched = condition.find {
-      case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
-      case _ => false
+    condition.exists {

Review comment:
       I revert this to:
   
   ```scala
   !condition.exists {
         case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
         case _ => false
       }
   ```
   @cloud-fan 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822356000



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
##########
@@ -246,6 +246,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Tre
     children.foldLeft(Option.empty[BaseType]) { (l, r) => l.orElse(r.find(f)) }
   }
 
+  /**
+   * Test whether there is [[TreeNode]] satisfies the conditions specified in `f`.
+   * The condition is recursively applied to this node and all of its children (pre-order).
+   */
+  def exists(f: BaseType => Boolean): Boolean = if (f(this)) {

Review comment:
       maybe a little simpler like this:
   ```
     def exists(pf: PartialFunction[BaseType, Boolean]): Boolean = {
       val lifted = pf.lift
       lifted(this).getOrElse {
         children.foldLeft(false) { (l, r) => l || r.exists(pf) }
       }
     }
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r823438839



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1690,11 +1690,10 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe
    */
   private def canPushThroughCondition(plan: LogicalPlan, condition: Expression): Boolean = {
     val attributes = plan.outputSet
-    val matched = condition.find {
-      case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
-      case _ => false
+    condition.exists {

Review comment:
       hmm... there is something wrong with this derivation process
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822370498



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -509,7 +509,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
 
     private def hasUnresolvedAlias(exprs: Seq[NamedExpression]) =
-      exprs.exists(_.find(_.isInstanceOf[UnresolvedAlias]).isDefined)
+      exprs.exists(_.exists(_.isInstanceOf[UnresolvedAlias]))

Review comment:
       Let's wait for more suggestions from others




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1063565736


   Yeah. I personally find `find(...).isEmpty` is easier to read vs. `exists`. But don't feel strongly too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1064022381


   > There are also some patterns like treeNode.collectFirst(condition).isDefined, but I think they can be put into a follow-up.
   
   Thanks ~ I'll find them and add a followup
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822278258



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
##########
@@ -246,6 +246,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Tre
     children.foldLeft(Option.empty[BaseType]) { (l, r) => l.orElse(r.find(f)) }
   }
 
+  /**
+   * Test whether there is [[TreeNode]] satisfies the conditions specified in `f`.
+   * The condition is recursively applied to this node and all of its children (pre-order).
+   */
+  def exists(f: BaseType => Boolean): Boolean = if (f(this)) {

Review comment:
       Like the following?
   
   ```scala
   def exists(pf: PartialFunction[BaseType, Boolean]): Boolean = {
       if (pf.applyOrElse(this, { _: Any => false})) {
         true
       } else {
         children.exists(c => pf.applyOrElse(c, { _: Any => false}))
       }
     }
   ```
   
   This implementation is more like `PartialFunction#cond`:
   
   ```scala
    /** Creates a Boolean test based on a value and a partial function.
      *  It behaves like a 'match' statement with an implied 'case _ => false'
      *  following the supplied cases.
      *
      *  @param  x   the value to test
      *  @param  pf  the partial function
      *  @return true, iff `x` is in the domain of `pf` and `pf(x) == true`.
      */
     def cond[T](x: T)(pf: PartialFunction[T, Boolean]): Boolean = pf.applyOrElse(x, constFalse)
   ```
   
   This requires the assumption that `case _ => false`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1063592168


   > Yeah. I personally find `find(...).isEmpty` is easier to read vs. `exists`. But don't feel strongly too.
   
   Yes, everyone may feel different. I respect your decision
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1063752062


   looks fine as it matches the scala collection API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822350214



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -509,7 +509,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
 
     private def hasUnresolvedAlias(exprs: Seq[NamedExpression]) =
-      exprs.exists(_.find(_.isInstanceOf[UnresolvedAlias]).isDefined)
+      exprs.exists(_.exists(_.isInstanceOf[UnresolvedAlias]))

Review comment:
       yes, If we apply this PartialFunction, call sites like this need to be rewritten.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r823431908



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1690,11 +1690,10 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe
    */
   private def canPushThroughCondition(plan: LogicalPlan, condition: Expression): Boolean = {
     val attributes = plan.outputSet
-    val matched = condition.find {
-      case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
-      case _ => false
+    condition.exists {

Review comment:
       This is how I derived this, the original logic same as:
   
   ```scala
   condition.find {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => false
   }.isEmpty
   ```
   It is equivalent to
   
   ```scala
   !condition.find {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => false
   }.noEmpty
   ```
   
   -->
   
   ```scala
   !condition.exists {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => false
   }
   ```
   
   ->
   ```scala
   condition.exists {
     case s: SubqueryExpression => !s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => !false
   }
   ```
   
   ->
   ```scala
   condition.exists {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).isEmpty
     case _ => true
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822281738



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -509,7 +509,7 @@ class Analyzer(override val catalogManager: CatalogManager)
     }
 
     private def hasUnresolvedAlias(exprs: Seq[NamedExpression]) =
-      exprs.exists(_.find(_.isInstanceOf[UnresolvedAlias]).isDefined)
+      exprs.exists(_.exists(_.isInstanceOf[UnresolvedAlias]))

Review comment:
       @zhengruifeng In addition, if `f` becomes `PartialFunction`, there may need to be rewritten as
   ```
   exprs.exists(_.exists { case _: UnresolvedAlias => true})
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1063591878


   > Hmm, I'm not sure about how much it reduces duplicate code. Seems not too much? Looks like you only eliminate `isDefined`, `isEmpty`, `nonEmpty`.
   > 
   > I'm okay with the change.
   
   Yes, just a small optimization


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822278258



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
##########
@@ -246,6 +246,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Tre
     children.foldLeft(Option.empty[BaseType]) { (l, r) => l.orElse(r.find(f)) }
   }
 
+  /**
+   * Test whether there is [[TreeNode]] satisfies the conditions specified in `f`.
+   * The condition is recursively applied to this node and all of its children (pre-order).
+   */
+  def exists(f: BaseType => Boolean): Boolean = if (f(this)) {

Review comment:
       Like
   ```scala
   def exists(pf: PartialFunction[BaseType, Boolean]): Boolean = {
       if (pf.applyOrElse(this, { _: Any => false})) {
         true
       } else {
         children.exists(c => pf.applyOrElse(c, { _: Any => false}))
       }
     }
   ```
   
   This implementation is more like `PartialFunction#cond` as follows:
   
   ```scala
    /** Creates a Boolean test based on a value and a partial function.
      *  It behaves like a 'match' statement with an implied 'case _ => false'
      *  following the supplied cases.
      *
      *  @param  x   the value to test
      *  @param  pf  the partial function
      *  @return true, iff `x` is in the domain of `pf` and `pf(x) == true`.
      */
     def cond[T](x: T)(pf: PartialFunction[T, Boolean]): Boolean = pf.applyOrElse(x, constFalse)
   ```
   
   This requires the assumption that `case _ => false`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1062625738


   cc @cloud-fan @viirya @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #35694:
URL: https://github.com/apache/spark/pull/35694


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1064041414


   thanks all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1064023868


   https://github.com/LuciferYang/spark/runs/5493065701
   
   <img width="852" alt="image" src="https://user-images.githubusercontent.com/1475305/157665797-740ba0d5-3e88-41cf-be28-f36c84198717.png">
   
   Seems GA passed but hang...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on pull request #35694:
URL: https://github.com/apache/spark/pull/35694#issuecomment-1064035142


   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r823431908



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1690,11 +1690,10 @@ object PushPredicateThroughNonJoin extends Rule[LogicalPlan] with PredicateHelpe
    */
   private def canPushThroughCondition(plan: LogicalPlan, condition: Expression): Boolean = {
     val attributes = plan.outputSet
-    val matched = condition.find {
-      case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
-      case _ => false
+    condition.exists {

Review comment:
       This is how I derived this, the original logic same as:
   
   ```scala
   condition.find {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => false
   }.isEmpty
   ```
   It is equivalent to
   
   ```scala
   !condition.find {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => false
   }.noEmpty
   ```
   
   -->
   
   ```scala
   !condition.exists {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => false
   }
   ```
   
   ->
   ```
   condition.exists {
     case s: SubqueryExpression => !s.plan.outputSet.intersect(attributes).nonEmpty
     case _ => !false
   }
   ```
   
   ->
   ```
   condition.exists {
     case s: SubqueryExpression => s.plan.outputSet.intersect(attributes).isEmpty
     case _ => true
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on a change in pull request #35694: [SPARK-38360][SQL][SS][PYTHON] Introduce a `exists` function for `TreeNode` to eliminate duplicate code patterns

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on a change in pull request #35694:
URL: https://github.com/apache/spark/pull/35694#discussion_r822254576



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
##########
@@ -246,6 +246,16 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product with Tre
     children.foldLeft(Option.empty[BaseType]) { (l, r) => l.orElse(r.find(f)) }
   }
 
+  /**
+   * Test whether there is [[TreeNode]] satisfies the conditions specified in `f`.
+   * The condition is recursively applied to this node and all of its children (pre-order).
+   */
+  def exists(f: BaseType => Boolean): Boolean = if (f(this)) {

Review comment:
       is it possible to make `f` a `PartialFunction[BaseType, Boolean]` and futher simplify some places like 
   ```
       e.exists {
         case s: ScalarSubquery => s.isCorrelated
         case _ => false
       }
   ```
   
   to
   
   ```
   e.exists {  case s: ScalarSubquery => s.isCorrelated    }
   ```
   
   ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org