You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/03/10 22:09:00 UTC
[jira] [Commented] (SPARK-38512) ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions

    [ https://issues.apache.org/jira/browse/SPARK-38512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17504617#comment-17504617 ] 

Apache Spark commented on SPARK-38512:
--------------------------------------

User 'alexeykudinkin' has created a pull request for this issue:
https://github.com/apache/spark/pull/35808

> ResolveFunctions implemented incorrectly requiring multiple passes to Resolve Nested Expressions 
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38512
>                 URL: https://issues.apache.org/jira/browse/SPARK-38512
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.2.0, 3.2.1
>            Reporter: Alexey Kudinkin
>            Priority: Critical
>
> ResolveFunctions Rule is implemented incorrectly requiring multiple passes to Resolve Nested Expressions:
> While Plan object is traversed correctly in post-order (bottoms-up, `plan.resolveOperatorsUpWithPruning), internally, Plan children though are traversed incorrectly in pre-order (top-down, using `transformExpressionsWithPruning`):
>  
> {code:java}
> case q: LogicalPlan =>
>   q.transformExpressionsWithPruning(...) { ... } {code}
>  
> Traversing in pre-order means that attempt is taken to resolve the current node, before its children are resolved, which is incorrect, since the node itself could not be resolved before its children are.
> While this is not leading to failures yet, this is taxing on performance – most of the expressions in Spark should be able to be resolved in a *single pass* (if resolved bottoms-up, take reproducible sample at the bottom). Instead, it currently takes Spark at least *N*  iterations to resolve such expressions, where N is proportional to the depth of the Expression tree.
>  
> Example to reproduce: 
>  
> {code:java}
> def resolveExpr(spark: SparkSession, exprStr: String, tableSchema: StructType): Expression = {
>   val expr = spark.sessionState.sqlParser.parseExpression(exprStr)
>   val analyzer = spark.sessionState.analyzer
>   val schemaFields = tableSchema.fields
>   val resolvedExpr = {
>     val plan: LogicalPlan = Filter(expr, LocalRelation(schemaFields.head, schemaFields.drop(1): _*))
>     val rules: Seq[Rule[LogicalPlan]] = {
>       analyzer.ResolveFunctions ::
>       analyzer.ResolveReferences ::
>       Nil
>     }
>     rules.foldRight(plan)((rule, plan) => rule.apply(plan))
>       .asInstanceOf[Filter]
>       .condition
>   }
>   resolvedExpr
> }
> // Invoke with
> resolveExpr(spark, "date_format(to_timestamp(B, 'yyyy-MM-dd'), 'MM/dd/yyyy')", StructType(StructField("B", StringType))){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org