You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/08 22:05:17 UTC

[GitHub] [spark] bersprockets commented on a change in pull request #35232: [SPARK-37947][SQL] Extract generator from GeneratorOuter expression contained by a Generate operator.

bersprockets commented on a change in pull request #35232:
URL: https://github.com/apache/spark/pull/35232#discussion_r822112696



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -2812,6 +2812,9 @@ class Analyzer(override val catalogManager: CatalogManager)
           p
         }
 
+      case g @ Generate(GeneratorOuter(generator), _, _, _, _, _) =>

Review comment:
       >how do we make sure that this is triggered before the analyzer strips GeneratorOuter?
   
   I am taking "strip" to mean "remove" (in case my answer makes no sense).
   
   The issue here is that the Analyzer was failing to strip the `GeneratorOuter` altogether.
   
   I could find only 2 places where the `GeneratorOuter` is stripped: [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L2750) and [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L2790).
   
   The first case strips a `GeneratorOuter` from the aggregate expressions of an `Aggregate` operator. When moving the generator from the aggregate expressions to a project list, this code re-wraps the generator with a `GeneratorOuter` (so in the end, the `GeneratorOuter` simply moves to a new location).
   
   The second case strips the `GeneratorOuter` from a project list.
   
   The missing case (which this PR attempts to cover) is replacing a `GeneratorOuter` contained in a `Generate` operator with its child generator (and setting the `outer` flag) so that the `ResolveGenerator` rule can match on the `Generate` operate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org