You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/09 04:16:01 UTC

[GitHub] [spark] viirya commented on a change in pull request #27503: [SPARK-30761]SQL] Nested column pruning should not prune on required child outputs in Generate

viirya commented on a change in pull request #27503: [SPARK-30761]SQL] Nested column pruning should not prune on required child outputs in Generate
URL: https://github.com/apache/spark/pull/27503#discussion_r376753522
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
 ##########
 @@ -179,7 +185,10 @@ object GeneratorNestedColumnAliasing {
 
     case g: Generate if SQLConf.get.nestedSchemaPruningEnabled &&
         canPruneGenerator(g.generator) =>
-      NestedColumnAliasing.getAliasSubMap(g.generator.children).map {
+      // For the child outputs required by the operator on top of `Generate`, we do not want
+      // to prune it.
+      val requiredAttrs = AttributeSet(g.requiredChildOutput)
+      NestedColumnAliasing.getAliasSubMap(g.generator.children, requiredAttrs).map {
 
 Review comment:
   This case normally should be treated by above case pattern (Project + Generate). But if all nested fields are selected at top Project, the above case won't prune. Then when Optimizer transforms down to the underlying Generate, only the referred nested column are kept and others are pruned from the child. It causes the accessors at top Project unresolved.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org