You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/01 06:08:22 UTC

[GitHub] [spark] viirya commented on a change in pull request #28490: [SPARK-31670][SQL]Resolve Struct Field in Grouping Aggregate with same ExprId

viirya commented on a change in pull request #28490:
URL: https://github.com/apache/spark/pull/28490#discussion_r480844760



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.
+     * @param trimAlias whether need to trim alias of Struct field. When true, we will trim
+     *                  Struct field alias. When isTopLevel = true, we won't trim top-level
+     *                  Struct field alias.
+     * @param isTopLevel whether need to trim top-level alias of Struct field. this param is

Review comment:
       nit: this -> This

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.

Review comment:
       nit: use -> used

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.
+     * @param trimAlias whether need to trim alias of Struct field. When true, we will trim
+     *                  Struct field alias. When isTopLevel = true, we won't trim top-level
+     *                  Struct field alias.
+     * @param isTopLevel whether need to trim top-level alias of Struct field. this param is
+     *                   controlled by this method itself to make sure we won't trim top-level
+     *                   Struct field alias. If need to trim top-level Struct field alias,
+     *                   we can do that outside of this method.

Review comment:
       Can we rephase this param doc too? Do you mean, this param is used by this method to know whether it is resolving top-level expression or not, if it is top-level, we skip trimming alias of struct field.

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1325,24 +1325,43 @@ class Analyzer(
      *
      * Note : In this routine, the unresolved attributes are resolved from the input plan's
      * children attributes.
+     *
+     * @param e the expression need to be resolved.
+     * @param q the LogicalPlan use to resolve expression's attribute from.
+     * @param trimAlias whether need to trim alias of Struct field. When true, we will trim
+     *                  Struct field alias. When isTopLevel = true, we won't trim top-level
+     *                  Struct field alias.

Review comment:
       This param doc reads weird. Do you mean, when `trimAlias` is true, the method will trim alias of a struct field. But this method won't trim alias if it is top-level expression?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1428,8 +1428,46 @@ class Analyzer(
       // SPARK-25942: Resolves aggregate expressions with `AppendColumns`'s children, instead of
       // `AppendColumns`, because `AppendColumns`'s serializer might produce conflict attribute
       // names leading to ambiguous references exception.
-      case a @ Aggregate(groupingExprs, aggExprs, appendColumns: AppendColumns) =>
-        a.mapExpressions(resolveExpressionTopDown(_, appendColumns))
+      case a: Aggregate =>
+        val planForResolve = a.child match {
+          case appendColumns: AppendColumns => appendColumns
+          case _ => a
+        }
+
+        val resolvedGroupingExprs =
+          a.groupingExpressions.map(resolveExpressionTopDown(_, planForResolve))
+            .map(trimStructFieldAlias)
+
+        val resolvedAggExprs = a.aggregateExpressions
+          .map(resolveExpressionTopDown(_, planForResolve))
+          .map {

Review comment:
       Hmm, where is `trimNonTopLevelStructFieldAlias`?

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -1425,11 +1444,48 @@ class Analyzer(
       // rule: ResolveDeserializer.
       case plan if containsDeserializer(plan.expressions) => plan
 
-      // SPARK-25942: Resolves aggregate expressions with `AppendColumns`'s children, instead of
-      // `AppendColumns`, because `AppendColumns`'s serializer might produce conflict attribute
-      // names leading to ambiguous references exception.
-      case a @ Aggregate(groupingExprs, aggExprs, appendColumns: AppendColumns) =>
-        a.mapExpressions(resolveExpressionTopDown(_, appendColumns))
+      case a: Aggregate =>
+        val planForResolve = a.child match {
+          case appendColumns: AppendColumns => appendColumns
+          case _ => a
+        }
+
+        val resolvedGroupingExprs = a.groupingExpressions
+          .map(resolveExpressionTopDown(_, planForResolve, trimAlias = true))
+          .map {
+            // trim Alias over top-level GetStructField
+            case Alias(s: GetStructField, _) => s
+            case other => other
+          }
+
+        val resolvedAggExprs = a.aggregateExpressions
+          .map(resolveExpressionTopDown(_, planForResolve, trimAlias = true))
+            .map(_.asInstanceOf[NamedExpression])
+
+        a.copy(resolvedGroupingExprs, resolvedAggExprs, a.child)
+
+      case g: GroupingSets =>
+        val resolvedSelectedExprs = g.selectedGroupByExprs
+          .map(_.map(resolveExpressionTopDown(_, g, trimAlias = true))
+            .map {
+              // trim Alias over top-level GetStructField
+              case Alias(s: GetStructField, _) => s
+              case other => other
+            })

Review comment:
       This is somehow hard to understand for reader. Why we need to trim alias for these expressions? Can you explain or maybe add an example as comment in the code?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org