You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ali Afroozeh (Jira)" <ji...@apache.org> on 2020/05/15 12:12:00 UTC

[jira] [Updated] (SPARK-31721) Assert optimized plan is initialized before tracking the execution of planning

     [ https://issues.apache.org/jira/browse/SPARK-31721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ali Afroozeh updated SPARK-31721:
---------------------------------
    Description: 
The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time that also includes the optimization time. This happens because the {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when first called. When {{df.queryExecution.executedPlan}} is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time.

This PR fixes this behavior by introducing a method {{assertOptimized}}, similar to {{assertAnalyzed}} that explicitly initializes the optimized plan. This method is called before measuring the time for {{sparkPlan}} and {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as planning time.

  was:
The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time that also includes the optimization time. This happens because the {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when first called. When {{df.queryExecution.executedPlan}} is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time.

This PR fixes this behavior by introducing a method {{assertOptimized}}, similar to {{assertAnalyzed}}that explicitly initializes the optimized plan. This method is called before measuring the time for {{sparkPlan}} and {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as planning time.


> Assert optimized plan is initialized before tracking the execution of planning
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-31721
>                 URL: https://issues.apache.org/jira/browse/SPARK-31721
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Ali Afroozeh
>            Priority: Major
>
> The {{QueryPlanningTracker}} in {{QueryExeuction}} reports the planning time that also includes the optimization time. This happens because the {{optimizedPlan}} in {{QueryExecution}} is lazy and only will initialize when first called. When {{df.queryExecution.executedPlan}} is called, the the tracker starts recording the planning time, and then calls the optimized plan. This causes the planning time to start before optimization and also include the planning time.
> This PR fixes this behavior by introducing a method {{assertOptimized}}, similar to {{assertAnalyzed}} that explicitly initializes the optimized plan. This method is called before measuring the time for {{sparkPlan}} and {{executedPlan}}. We call it before {{sparkPlan}} because that also counts as planning time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org