You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/08 04:11:18 UTC
[GitHub] [spark] pan3793 opened a new pull request #35431: [SPARK-36444][SQL][3.1] Remove OptimizeSubqueries from batch of PartitionPruning
pan3793 opened a new pull request #35431:
URL: https://github.com/apache/spark/pull/35431
### What changes were proposed in this pull request?
This is a backport PR of #33664 for branch-3.1.
### Why are the changes needed?
We found a query in production that cost lots of time in optimize phase when enable DPP, the SQL pattern like
```
select <cols...>
from a
left join b on a.<col> = b.<col>
left join c on b.<col> = c.<col>
left join d on c.<col> = d.<col>
left join e on d.<col> = e.<col>
left join f on e.<col> = f.<col>
left join g on f.<col> = g.<col>
left join h on g.<col> = h.<col>
...
```
<details>
<summary>Before this PR, Analyzer/Optimizer costs 15821.6 seconds</summary>
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 24763671
Total time: 15821.588159083 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 6325784430499 / 14213630904736 2047 / 342802
org.apache.spark.sql.catalyst.optimizer.ColumnPruning 28009746 / 267164777817 1 / 691736
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability 0 / 118329246545 0 / 342804
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 8997600 / 71070686823 1 / 348934
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 923149600 / 59903945393 2036 / 348934
org.apache.spark.sql.catalyst.optimizer.NullPropagation 12343892 / 52895299683 1 / 348934
org.apache.spark.sql.execution.datasources.SchemaPruning 0 / 51638842573 0 / 171401
org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison 0 / 42686293006 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals 0 / 41190229736 0 / 348934
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 1482075440 / 40899690529 2048 / 171401
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison 0 / 40747831162 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions 0 / 39946334062 0 / 348934
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 655084587 / 39426073462 2047 / 348934
org.apache.spark.sql.catalyst.optimizer.PruneFilters 489780 / 34731670364 1 / 520335
org.apache.spark.sql.catalyst.optimizer.OptimizeJsonExprs 0 / 33533545307 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate 0 / 31910754610 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ConstantFolding 22871375 / 31308048659 1 / 348934
org.apache.spark.sql.catalyst.optimizer.LikeSimplification 0 / 31276042884 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeWindowFunctions 0 / 31100888868 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 0 / 30855866688 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts 0 / 30414758547 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeIn 0 / 30211101358 0 / 348934
org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields 0 / 29986491532 0 / 348936
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions 0 / 29740191900 0 / 348934
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps 0 / 25399289000 0 / 348934
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery 0 / 21381558331 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushDownPredicates 5831922974 / 18327095753 3072 / 691737
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions 5007709 / 17431480903 1 / 171401
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabaseAndCatalog 0 / 15321596322 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime 0 / 15060720651 0 / 171401
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates 0 / 14874103589 0 / 171401
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions 50420951 / 14798400219 1 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceUpdateFieldsExpression 0 / 14791439186 0 / 171401
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects 0 / 14361213979 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReassignLambdaVariableID 0 / 14258459712 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteNonCorrelatedExists 0 / 14094898098 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveNoopOperators 1975195 / 12967136767 2 / 691736
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates 0 / 12573667353 0 / 171401
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown 0 / 11355809540 0 / 171401
org.apache.spark.sql.execution.dynamicpruning.CleanupDynamicPruningFilters 350911193 / 9931582633 2048 / 171401
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 0 / 8268511133 0 / 348934
org.apache.spark.sql.catalyst.optimizer.CollapseProject 10135014 / 6764166524 1 / 520335
org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint 0 / 6244880486 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CombineUnions 0 / 5811583863 0 / 520335
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates 28926471 / 5423624109 1 / 171401
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition 0 / 5245332974 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateSorts 0 / 5051398253 0 / 171401
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation 0 / 4975057807 0 / 342802
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers 0 / 4804919393 0 / 171401
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate 0 / 4408441952 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CollapseWindow 0 / 4349408795 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization 0 / 4170274724 0 / 348934
org.apache.spark.sql.catalyst.optimizer.TransposeWindow 0 / 4113327220 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 0 / 4107443950 0 / 348934
org.apache.spark.sql.catalyst.optimizer.CombineFilters 0 / 4061601886 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion 0 / 4042615936 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin 0 / 4024293929 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateLimits 0 / 3908219408 0 / 348934
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery 135506567 / 3895045754 2047 / 171401
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin 0 / 3875669986 0 / 348934
org.apache.spark.sql.catalyst.optimizer.PushLeftSemiLeftAntiThroughJoin 0 / 3875663441 0 / 348934
org.apache.spark.sql.catalyst.optimizer.LimitPushDown 0 / 3859481346 0 / 348934
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate 0 / 3404517096 0 / 171401
org.apache.spark.sql.catalyst.optimizer.PushExtraPredicateThroughJoin 0 / 2876802099 0 / 171401
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin 0 / 2546037832 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ExtractPythonUDFFromJoinCondition 0 / 2392444200 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions 0 / 2297492778 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll 0 / 2239751163 0 / 171401
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromGenerate 0 / 2189398940 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions 0 / 2187645769 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter 0 / 2135169574 0 / 171401
org.apache.spark.sql.execution.python.ExtractGroupingPythonUDFFromAggregate 0 / 2058019085 0 / 171401
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases 602948 / 2030110600 1 / 171401
org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero 0 / 2028208284 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters 0 / 1985665817 0 / 171401
org.apache.spark.sql.catalyst.analysis.EliminateView 0 / 1965869470 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning 0 / 1892132711 0 / 171401
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll 0 / 1891839429 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin 0 / 1873028976 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate 0 / 1872377392 0 / 171401
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin 0 / 1860128741 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts 0 / 405470117 0 / 342802
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery 0 / 242854215 0 / 171401
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder 0 / 226833963 0 / 171401
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 117151873 / 123533385 7 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 102066025 / 121689146 5 / 13
org.apache.spark.sql.catalyst.optimizer.CombineConcats 0 / 112256648 0 / 348934
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 76183548 / 108546928 4 / 13
org.apache.spark.sql.catalyst.optimizer.EliminateAggregateFilter 0 / 107905167 0 / 348934
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct 0 / 103804760 0 / 171401
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 85607389 / 94162213 8 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 76446295 / 93129469 5 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 53450988 / 87954605 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 85268434 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 50348094 / 80404327 4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 39786113 / 59089899 3 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 56110133 / 57619849 10 / 13
org.apache.spark.sql.execution.dynamicpruning.PartitionPruning 22888739 / 54329716 1 / 171401
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 0 / 49832160 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 0 / 46158859 0 / 13
org.apache.spark.sql.execution.datasources.FindDataSourceTable 43323608 / 44575170 1 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 14900959 / 39466022 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic 0 / 38815701 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 36470552 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 7216638 / 35530574 1 / 13
org.apache.spark.sql.execution.python.ExtractPythonUDFs 0 / 34271447 0 / 171401
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables 0 / 32166544 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IntegralDivision 0 / 30547791 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveExpressionsWithNamePlaceholders 0 / 30506602 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringLiteralCoercion 0 / 30116261 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions 0 / 27859446 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 24218064 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 23413444 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed 0 / 22036715 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 21957835 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 0 / 21463552 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion 0 / 20489026 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 0 / 20481142 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 0 / 19715779 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 0 / 19693693 0 / 13
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 19530333 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 16526728 / 18814733 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 0 / 16649474 0 / 13
org.apache.spark.sql.execution.aggregate.ResolveEncodersInScalaAgg 0 / 15275758 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 11329936 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 4281027 / 10765657 1 / 13
org.apache.spark.sql.catalyst.analysis.CTESubstitution 0 / 9577794 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 0 / 9217208 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns 0 / 8289279 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 0 / 8192539 0 / 13
org.apache.spark.sql.catalyst.analysis.ApplyCharTypePadding 0 / 7991101 0 / 2
org.apache.spark.sql.catalyst.analysis.CleanupAliases 5307130 / 7435920 1 / 3
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 0 / 6506053 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 0 / 6446727 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 0 / 6075593 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog 0 / 4092936 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 0 / 3430531 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables 0 / 2966446 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 0 / 2864876 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 0 / 2818668 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 0 / 2816430 0 / 13
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin 0 / 2491546 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveUnion 0 / 2477483 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation 0 / 2463866 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF 0 / 2376711 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 0 / 2133804 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec 0 / 1730246 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 0 / 1712259 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 0 / 1698918 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 0 / 1618569 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 0 / 1589833 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs 0 / 1560151 0 / 13
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 0 / 1500410 0 / 1049
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 0 / 1463204 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns 0 / 1355593 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace 0 / 1339813 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 0 / 1338443 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto 0 / 1323062 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 0 / 1318399 0 / 2
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 0 / 1302134 0 / 13
org.apache.spark.sql.execution.datasources.FallBackFileSourceV2 0 / 1074454 0 / 13
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 0 / 698629 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints 0 / 452536 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints 0 / 306642 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 0 / 300756 0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableCreation 0 / 276685 0 / 2
org.apache.spark.sql.catalyst.analysis.EliminateUnions 0 / 214857 0 / 2
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 0 / 203514 0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 0 / 164163 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveNoopDropTable 0 / 108345 0 / 2
org.apache.spark.sql.execution.datasources.DataSourceAnalysis 0 / 90574 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAlterTableChanges 0 / 85175 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 0 / 82212 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$DisableHints 0 / 7780 0 / 2
```
</details>
<details>
<summary>After this PR, Analyzer/Optimizer costs 2.4 seconds seconds</summary>
```
=== Metrics of Analyzer/Optimizer Rules ===
Total number of runs: 2140
Total time: 2.407325128 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 79087019 / 116017648 4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 88569423 / 112854377 5 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 89163583 / 95807127 7 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 73197512 / 91715660 5 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 73390555 / 81845702 8 / 13
org.apache.spark.sql.catalyst.optimizer.ColumnPruning 24099474 / 80271976 1 / 6
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 79928019 0 / 13
org.apache.spark.sql.catalyst.optimizer.PushDownPredicates 73617831 / 79035882 2 / 7
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 52077624 / 78992522 4 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 39724587 / 73183778 1 / 13
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 52807613 / 54633834 10 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 21340706 / 53040407 1 / 13
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions 51595369 / 51595369 1 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 32842401 / 49095324 3 / 13
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 0 / 48574229 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 0 / 43384882 0 / 13
org.apache.spark.sql.execution.datasources.FindDataSourceTable 40348276 / 41709977 1 / 13
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 40235278 / 40235278 1 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 13542681 / 38418852 1 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 35130781 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IntegralDivision 0 / 35129152 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StringLiteralCoercion 0 / 32116493 0 / 13
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates 32065069 / 32065069 1 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveBinaryArithmetic 0 / 30166678 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables 0 / 30082254 0 / 13
org.apache.spark.sql.catalyst.optimizer.ConstantFolding 23910538 / 29049259 1 / 4
org.apache.spark.sql.catalyst.analysis.ResolveExpressionsWithNamePlaceholders 0 / 26220500 0 / 13
org.apache.spark.sql.execution.dynamicpruning.PartitionPruning 25857276 / 25857276 1 / 1
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 11800935 / 25453815 1 / 4
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 24882479 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions 0 / 24773800 0 / 13
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 0 / 23687102 0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeUpdateFields 0 / 22496833 0 / 6
org.apache.spark.sql.catalyst.analysis.UpdateAttributeNullability 0 / 22086610 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 0 / 21776763 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 21668319 0 / 13
org.apache.spark.sql.execution.datasources.SchemaPruning 0 / 21119650 0 / 1
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 20995370 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion 0 / 20852224 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 0 / 20482582 0 / 13
org.apache.spark.sql.catalyst.optimizer.NullPropagation 8449073 / 20310580 1 / 4
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals 0 / 19685107 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 0 / 18005681 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed 0 / 17873443 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 0 / 16909722 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 14231820 / 16732791 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 0 / 16280559 0 / 13
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 0 / 16121492 0 / 4
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 15969419 0 / 13
org.apache.spark.sql.execution.aggregate.ResolveEncodersInScalaAgg 0 / 15441100 0 / 13
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison 0 / 14334553 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 5534984 / 12617237 1 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 10014043 0 / 2
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions 0 / 9761592 0 / 4
org.apache.spark.sql.catalyst.analysis.CTESubstitution 0 / 9747538 0 / 2
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery 0 / 9694808 0 / 4
org.apache.spark.sql.catalyst.optimizer.UnwrapCastInBinaryComparison 0 / 9665154 0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeJsonExprs 0 / 9430804 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns 0 / 9322119 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 0 / 9113696 0 / 13
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps 0 / 8938695 0 / 4
org.apache.spark.sql.catalyst.optimizer.OptimizeIn 0 / 8921298 0 / 4
org.apache.spark.sql.catalyst.optimizer.ReplaceNullWithFalseInPredicate 0 / 8695066 0 / 4
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions 0 / 8438285 0 / 4
org.apache.spark.sql.catalyst.analysis.ApplyCharTypePadding 0 / 8406029 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 0 / 8196064 0 / 4
org.apache.spark.sql.catalyst.optimizer.CollapseProject 6878987 / 8000632 1 / 5
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts 0 / 7996708 0 / 4
org.apache.spark.sql.catalyst.optimizer.LikeSimplification 0 / 7983621 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 0 / 7948113 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 0 / 7930266 0 / 13
org.apache.spark.sql.catalyst.optimizer.OptimizeWindowFunctions 0 / 7569548 0 / 4
org.apache.spark.sql.catalyst.optimizer.PruneFilters 1298537 / 7511669 1 / 5
org.apache.spark.sql.execution.python.ExtractPythonUDFs 0 / 7419094 0 / 1
org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown 0 / 7129007 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 0 / 6953704 0 / 13
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 0 / 6409068 0 / 13
org.apache.spark.sql.catalyst.optimizer.RemoveNoopOperators 2948625 / 6073057 2 / 6
org.apache.spark.sql.catalyst.analysis.CleanupAliases 3304603 / 5512866 1 / 3
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions 5247113 / 5247113 1 / 1
org.apache.spark.sql.catalyst.analysis.ResolveSessionCatalog 0 / 4331593 0 / 13
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers 0 / 3939580 0 / 1
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 0 / 3872746 0 / 4
org.apache.spark.sql.execution.dynamicpruning.CleanupDynamicPruningFilters 3285689 / 3285689 1 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables 0 / 2954363 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 0 / 2772122 0 / 13
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates 0 / 2751645 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 0 / 2718652 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 0 / 2682978 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 0 / 2598472 0 / 13
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate 0 / 2542252 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveEncodersInUDF 0 / 2451105 0 / 2
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects 0 / 2438919 0 / 1
org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin 0 / 2428491 0 / 2
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 0 / 2277468 0 / 1049
org.apache.spark.sql.catalyst.optimizer.EliminateAggregateFilter 0 / 2273198 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateSorts 0 / 2256059 0 / 1
org.apache.spark.sql.catalyst.optimizer.ReplaceUpdateFieldsExpression 0 / 2227798 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveUnion 0 / 2220071 0 / 13
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates 0 / 2216158 0 / 1
org.apache.spark.sql.catalyst.optimizer.ReassignLambdaVariableID 0 / 2190345 0 / 1
org.apache.spark.sql.catalyst.optimizer.RewriteNonCorrelatedExists 0 / 2106499 0 / 1
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions 0 / 2088289 0 / 1
org.apache.spark.sql.catalyst.optimizer.CombineConcats 0 / 2073285 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint 0 / 1922211 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 0 / 1900401 0 / 13
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime 0 / 1895150 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 0 / 1863956 0 / 13
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabaseAndCatalog 0 / 1816019 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 0 / 1787068 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 0 / 1769046 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolveCatalogs 0 / 1605576 0 / 13
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct 0 / 1591594 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 0 / 1547587 0 / 13
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery 0 / 1526146 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 0 / 1506795 0 / 13
org.apache.spark.sql.catalyst.analysis.ResolvePartitionSpec 0 / 1502020 0 / 13
org.apache.spark.sql.catalyst.optimizer.PushExtraPredicateThroughJoin 0 / 1442567 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 0 / 1420037 0 / 13
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 0 / 1399535 0 / 13
org.apache.spark.sql.execution.python.ExtractGroupingPythonUDFFromAggregate 0 / 1385780 0 / 1
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 0 / 1359249 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNamespace 0 / 1356344 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation 0 / 1355910 0 / 13
org.apache.spark.sql.execution.datasources.FallBackFileSourceV2 0 / 1325995 0 / 13
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUserSpecifiedColumns 0 / 1325072 0 / 13
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition 0 / 1322557 0 / 4
org.apache.spark.sql.catalyst.optimizer.PushDownLeftSemiAntiJoin 0 / 1300710 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization 0 / 1227217 0 / 4
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 0 / 1226182 0 / 2
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveInsertInto 0 / 1225119 0 / 13
org.apache.spark.sql.catalyst.optimizer.PushLeftSemiLeftAntiThroughJoin 0 / 1172415 0 / 4
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 0 / 1158346 0 / 4
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation 0 / 1156508 0 / 2
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion 0 / 1082244 0 / 4
org.apache.spark.sql.catalyst.optimizer.CombineUnions 0 / 1050254 0 / 5
org.apache.spark.sql.catalyst.optimizer.CollapseWindow 0 / 1003136 0 / 4
org.apache.spark.sql.catalyst.optimizer.TransposeWindow 0 / 1001447 0 / 4
org.apache.spark.sql.catalyst.optimizer.ExtractPythonUDFFromJoinCondition 0 / 1001440 0 / 1
org.apache.spark.sql.catalyst.optimizer.LimitPushDown 0 / 974802 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateLimits 0 / 927602 0 / 4
org.apache.spark.sql.catalyst.optimizer.CombineFilters 0 / 924219 0 / 4
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin 0 / 768081 0 / 4
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 0 / 733641 0 / 2
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions 0 / 449017 0 / 1
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases 434360 / 434360 1 / 1
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughNonJoin 0 / 379178 0 / 1
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromGenerate 0 / 295206 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 0 / 292907 0 / 2
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters 0 / 256204 0 / 1
org.apache.spark.sql.catalyst.optimizer.ObjectSerializerPruning 0 / 249912 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints 0 / 249513 0 / 2
org.apache.spark.sql.catalyst.analysis.EliminateUnions 0 / 240773 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveJoinStrategyHints 0 / 230937 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate 0 / 222074 0 / 1
org.apache.spark.sql.catalyst.analysis.EliminateView 0 / 216679 0 / 1
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 0 / 137675 0 / 2
org.apache.spark.sql.execution.datasources.PreprocessTableCreation 0 / 124279 0 / 2
org.apache.spark.sql.catalyst.analysis.ResolveNoopDropTable 0 / 111526 0 / 2
org.apache.spark.sql.execution.datasources.DataSourceAnalysis 0 / 100454 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter 0 / 99833 0 / 1
org.apache.spark.sql.catalyst.optimizer.OptimizeLimitZero 0 / 98320 0 / 1
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 0 / 96634 0 / 2
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll 0 / 96403 0 / 1
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll 0 / 94322 0 / 1
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAlterTableChanges 0 / 94092 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin 0 / 92092 0 / 1
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin 0 / 91821 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 0 / 91637 0 / 2
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate 0 / 90672 0 / 1
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts 0 / 42624 0 / 2
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery 0 / 25905 0 / 1
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder 0 / 9020 0 / 1
org.apache.spark.sql.catalyst.analysis.ResolveHints$DisableHints 0 / 8111 0 / 2
```
</details>
The original description of SPARK-36444 did not show this improvement, but it do significantly improve the SQL compile performance for such cases.
### Does this PR introduce _any_ user-facing change?
Significant SQL compile performance improvement for some cases.
### How was this patch tested?
Added UT.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] pan3793 commented on pull request #35431: [SPARK-36444][SQL][3.1] Remove OptimizeSubqueries from batch of PartitionPruning
Posted by GitBox <gi...@apache.org>.
pan3793 commented on pull request #35431:
URL: https://github.com/apache/spark/pull/35431#issuecomment-1032196014
cc @wangyum, I know it's a perf only change, but consider the huge improvement of sql compile performance, I think it's worth to backport to branch-3.1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #35431: [SPARK-36444][SQL][3.1] Remove OptimizeSubqueries from batch of PartitionPruning
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #35431:
URL: https://github.com/apache/spark/pull/35431
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] pan3793 commented on pull request #35431: [SPARK-36444][SQL][3.1] Remove OptimizeSubqueries from batch of PartitionPruning
Posted by GitBox <gi...@apache.org>.
pan3793 commented on pull request #35431:
URL: https://github.com/apache/spark/pull/35431#issuecomment-1032200728
Also cc @cloud-fan @maryannxue @yaooqinn
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] pan3793 edited a comment on pull request #35431: [SPARK-36444][SQL][3.1] Remove OptimizeSubqueries from batch of PartitionPruning
Posted by GitBox <gi...@apache.org>.
pan3793 edited a comment on pull request #35431:
URL: https://github.com/apache/spark/pull/35431#issuecomment-1032237690
> Apache Spark community follows Semantic Versioning and doesn't allow backporting of new feature or improvements.
Thanks @dongjoon-hyun I think maybe we can treat it as a bug based on https://github.com/apache/spark/pull/33664#issuecomment-901589931, comparing to 2.4s Analyzer/Optimizer costs 15821.6 seconds looks really unreasonable.
> To be clear, I'm asking you to upgrade Kyuubi to 3.2.1 instead of 3.1.2
Thanks for tips, Kyuubi supports both Spark 3.0/3.1/3.2, and default build against Spark 3.1, we are planning to switch the to Spark 3.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] pan3793 commented on pull request #35431: [SPARK-36444][SQL][3.1] Remove OptimizeSubqueries from batch of PartitionPruning
Posted by GitBox <gi...@apache.org>.
pan3793 commented on pull request #35431:
URL: https://github.com/apache/spark/pull/35431#issuecomment-1032237690
> Apache Spark community follows Semantic Versioning and doesn't allow backporting of new feature or improvements.
Thanks @dongjoon-hyun I think maybe we can treat it as a bug based on https://github.com/apache/spark/pull/33664#issuecomment-901589931, comparing to 2.4s Analyzer/Optimizer costs 15821.6 seconds looks really unreasonable.
> To be clear, I'm asking you to upgrade Kyuubi to 3.2.1 instead of 3.1.2
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org