You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "cloud-fan (via GitHub)" <gi...@apache.org> on 2023/02/23 12:13:40 UTC

[GitHub] [spark] cloud-fan commented on a diff in pull request #29643: [SPARK-32638][SQL][FOLLOWUP] Move the plan rewriting methods to QueryPlan

cloud-fan commented on code in PR #29643:
URL: https://github.com/apache/spark/pull/29643#discussion_r1115586047


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala:
##########
@@ -168,6 +170,89 @@ abstract class QueryPlan[PlanType <: QueryPlan[PlanType]] extends TreeNode[PlanT
     }.toSeq
   }
 
+  /**
+   * A variant of `transformUp`, which takes care of the case that the rule replaces a plan node
+   * with a new one that has different output expr IDs, by updating the attribute references in
+   * the parent nodes accordingly.
+   *
+   * @param rule the function to transform plan nodes, and return new nodes with attributes mapping
+   *             from old attributes to new attributes. The attribute mapping will be used to
+   *             rewrite attribute references in the parent nodes.
+   * @param skipCond a boolean condition to indicate if we can skip transforming a plan node to save
+   *                 time.
+   */
+  def transformUpWithNewOutput(
+      rule: PartialFunction[PlanType, (PlanType, Seq[(Attribute, Attribute)])],
+      skipCond: PlanType => Boolean = _ => false): PlanType = {
+    def rewrite(plan: PlanType): (PlanType, Seq[(Attribute, Attribute)]) = {
+      if (skipCond(plan)) {
+        plan -> Nil
+      } else {
+        val attrMapping = new mutable.ArrayBuffer[(Attribute, Attribute)]()
+        var newPlan = plan.mapChildren { child =>
+          val (newChild, childAttrMapping) = rewrite(child)
+          attrMapping ++= childAttrMapping
+          newChild
+        }
+
+        val attrMappingForCurrentPlan = attrMapping.filter {
+          // The `attrMappingForCurrentPlan` is used to replace the attributes of the
+          // current `plan`, so the `oldAttr` must be part of `plan.references`.
+          case (oldAttr, _) => plan.references.contains(oldAttr)

Review Comment:
   Can we add a base trait for plans that override `references` with `child.outputSet`? Then we can match this trait here and skip calling `reference`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org