You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2021/05/11 07:05:00 UTC

[jira] [Commented] (SPARK-35365) spark3.1.1 use too long to analyze table fields

    [ https://issues.apache.org/jira/browse/SPARK-35365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342354#comment-17342354 ] 

Yuming Wang commented on SPARK-35365:
-------------------------------------

[~xiaohua] Could you check which rule affect the performance, for example:
{noformat}
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 3022
Total time: 7.941302436 seconds

Rule                                                                              Effective Time / Total Time                     Effective Runs / Total Runs                    

org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                  3350202022 / 3357847817                         7 / 39                                         
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions                  11946476 / 588567543                            6 / 39                                         
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                 516175887 / 577794974                           15 / 39                                        
org.apache.spark.sql.catalyst.analysis.TimeWindowing                              0 / 519817133                                   0 / 39                                         
org.apache.spark.sql.catalyst.analysis.DecimalPrecision                           226306881 / 271650752                           11 / 39                                        
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                        9838775 / 202214973                             1 / 6                                          
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings                141138907 / 188596520                           3 / 39                                         
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion                  107365436 / 185270852                           3 / 39                                         
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts             58358334 / 140943690                            3 / 39                                         
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator                  0 / 119236169                                   0 / 39                                         
org.apache.spark.sql.catalyst.optimizer.ColumnPruning                             41291489 / 76464261                             2 / 8                                          
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder                0 / 64775042                                    0 / 39                                         
org.apache.spark.sql.catalyst.analysis.AlignViewOutput                            0 / 61796761                                    0 / 39                                         
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality               0 / 58143331                                    0 / 39     
{noformat}


> spark3.1.1 use too long to analyze table fields
> -----------------------------------------------
>
>                 Key: SPARK-35365
>                 URL: https://issues.apache.org/jira/browse/SPARK-35365
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.1.1
>            Reporter: yao
>            Priority: Major
>
> I have a big sql with a few width tables join and complex logic, when I run that in spark 2.4 , it will take 20 minues in analyze phase, when I use spark 3.1.1, it will use about 40 minutes,
> I need set spark.sql.analyzer.maxIterations=1000 in spark3.1.1.
> or spark.sql.optimizer.maxIterations=1000 in spark2.4.
> no other special setting for this .
> I check on the spark ui , I find that there is no job generated, all executor have no active tasks, and when I set log level to debug, I find that the job is in analyze phase, analyze the fields reference.
> this phase use too long time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org