You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/03 14:23:32 UTC

[GitHub] [spark] maropu commented on a change in pull request #29485: [SPARK-32638][SQL] Corrects references when adding aliases in WidenSetOperationTypes

maropu commented on a change in pull request #29485:
URL: https://github.com/apache/spark/pull/29485#discussion_r483017255



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
##########
@@ -385,12 +408,16 @@ object TypeCoercion {
     }
 
     /** Given a plan, add an extra project on top to widen some columns' data types. */
-    private def widenTypes(plan: LogicalPlan, targetTypes: Seq[DataType]): LogicalPlan = {
+    private def widenTypes(plan: LogicalPlan, targetTypes: Seq[DataType])
+      : (LogicalPlan, LogicalPlan) = {
       val casted = plan.output.zip(targetTypes).map {
-        case (e, dt) if e.dataType != dt => Alias(Cast(e, dt), e.name)()
-        case (e, _) => e
-      }
-      Project(casted, plan)
+        case (e, dt) if e.dataType != dt =>
+          val alias = Alias(Cast(e, dt), e.name)(exprId = e.exprId)
+          alias -> alias.newInstance()
+        case (e, _) =>
+          e -> e
+      }.unzip
+      Project(casted._1, plan) -> Project(casted._2, plan)

Review comment:
       This generates a rewrite map used for `Analyzer.rewritePlan`. The `rewritePlan` assumes a plan structure is the same before/after plan rewriting, so this `WidenSetOperationTypes` rule does two-phase transformation now as follows;
   ```
   ### Input Plan (Query described in the PR description) ###
   Project [v#1]
   +- SubqueryAlias t
      +- Union
         :+- Project [v#1]
         :   +- SubqueryAlias t3
         :      ...
         +- Project [v#2]
            +- Project [CheckOverflow((promote_precision(cast(v#1 as decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0)))), DecimalType(11,0), true) AS v#2]
               +- SubqueryAlias t3
                  ...
   
   ### Phase-1 (Adds Project, but not update ExprId) ###
   Project [v#1]
   +- SubqueryAlias t
      +- Union
         :- Project [cast(v#1 as decimal(11,0)) AS v#1] <--- !!!Adds Project to widen a type!!!
         :  +- Project [v#1]
         :     +- SubqueryAlias t3
         :        ...
         +- Project [v#2]
            +- Project [CheckOverflow((promote_precision(cast(v#1 as decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0)))), DecimalType(11,0), true) AS v#2]
               ...
   
   ### Phase-2 ###
   // Analyzer.rewritePlan updates ExprIds based on a rewrite map:
   // `Project [cast(v#1 as decimal(11,0)) AS v#1]` => Project [cast(v#1 as decimal(11,0)) AS v#3]
   Project [v#3] <--- !!!Updates ExprId!!!
   +- SubqueryAlias t
      +- Union
         :- Project [cast(v#1 as decimal(11,0)) AS v#3] <--- !!!Updates ExprId!!!
         :  +- Project [v#1]
         :     +- SubqueryAlias t3
         :        ...
         +- Project [v#2]
            +- Project [CheckOverflow((promote_precision(cast(v#1 as decimal(11,0))) + promote_precision(cast(v#1 as decimal(11,0)))), DecimalType(11,0), true) AS v#2]
               ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org