You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "ryan-johnson-databricks (via GitHub)" <gi...@apache.org> on 2023/03/10 04:20:49 UTC

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

ryan-johnson-databricks commented on code in PR #40321:
URL: https://github.com/apache/spark/pull/40321#discussion_r1131931758


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1033,9 +1033,12 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor
         requiredAttrIds.contains(a.exprId)) =>
         s.withMetadataColumns()
       case p: Project if p.metadataOutput.exists(a => requiredAttrIds.contains(a.exprId)) =>
+        // Inject the requested metadata columns into the project's output, if not already present.

Review Comment:
   I hit a weird endless loop with this while debugging this `SubqueryAlias` issue. Basically, if the plan root already has a metadata attribute, but it's not available because the `SubqueryAlias` blocked it, this rule kept endlessly (re)appending the metadata column to the projections below the `SubqueryAlias`. Once the rule ran 100 times (leaving 100 copies of `_metadata` in the `Project` output), the endless loop detector kicked in and killed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org