You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2021/05/11 07:05:00 UTC
[jira] [Commented] (SPARK-35365) spark3.1.1 use too long to analyze
table fields
[ https://issues.apache.org/jira/browse/SPARK-35365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342354#comment-17342354 ]
Yuming Wang commented on SPARK-35365:
-------------------------------------
[~xiaohua] Could you check which rule affect the performance, for example:
{noformat}
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 3022
Total time: 7.941302436 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 3350202022 / 3357847817 7 / 39
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 11946476 / 588567543 6 / 39
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 516175887 / 577794974 15 / 39
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 519817133 0 / 39
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 226306881 / 271650752 11 / 39
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin 9838775 / 202214973 1 / 6
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 141138907 / 188596520 3 / 39
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 107365436 / 185270852 3 / 39
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 58358334 / 140943690 3 / 39
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 119236169 0 / 39
org.apache.spark.sql.catalyst.optimizer.ColumnPruning 41291489 / 76464261 2 / 8
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 0 / 64775042 0 / 39
org.apache.spark.sql.catalyst.analysis.AlignViewOutput 0 / 61796761 0 / 39
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 58143331 0 / 39
{noformat}
> spark3.1.1 use too long to analyze table fields
> -----------------------------------------------
>
> Key: SPARK-35365
> URL: https://issues.apache.org/jira/browse/SPARK-35365
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.1
> Reporter: yao
> Priority: Major
>
> I have a big sql with a few width tables join and complex logic, when I run that in spark 2.4 , it will take 20 minues in analyze phase, when I use spark 3.1.1, it will use about 40 minutes,
> I need set spark.sql.analyzer.maxIterations=1000 in spark3.1.1.
> or spark.sql.optimizer.maxIterations=1000 in spark2.4.
> no other special setting for this .
> I check on the spark ui , I find that there is no job generated, all executor have no active tasks, and when I set log level to debug, I find that the job is in analyze phase, analyze the fields reference.
> this phase use too long time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org