You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/10/20 01:43:15 UTC
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/22778
[SPARK-25784][SQL] Infer filters from constraints after rewriting predicate subquery
## What changes were proposed in this pull request?
Infer filters from constraints after rewriting predicate subquery.
## How was this patch tested?
unit tests and benchmark tests
```scala
withTempView("t1", "t2") {
withTempDir { dir =>
spark.range(3000000)
.selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as c2", "id as c3")
.coalesce(1)
.orderBy("c2")
.write
.mode("overwrite")
.option("parquet.block.size", 10485760)
.parquet(dir.getCanonicalPath)
spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
spark.read.parquet(dir.getCanonicalPath).createTempView("t2")
Seq("c1", "c2", "c3").foreach { column =>
val benchmark = new Benchmark(s"join key $column", 10)
Seq(false, true).foreach { inferFilters =>
benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { _ =>
withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> inferFilters.toString) {
sql(s"select t1.* from t1 where t1.$column in (select $column from t2)").count()
}
}
}
benchmark.run()
}
}
}
```
```
ava HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c1: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Is infer filters false 2005 / 2163 0.0 200481431.0 1.0X
Is infer filters true 190 / 207 0.0 18962935.7 10.6X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c2: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Is infer filters false 2368 / 2498 0.0 236803743.1 1.0X
Is infer filters true 1234 / 1268 0.0 123443912.3 1.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c3: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Is infer filters false 2754 / 2907 0.0 275376009.7 1.0X
Is infer filters true 2237 / 2255 0.0 223739457.8 1.2X
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-25784
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22778.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22778
----
commit c8d1b91b93e7ad05ca0bd17984fad1c30062d504
Author: Yuming Wang <yu...@...>
Date: 2018-10-20T01:39:51Z
Infer filters from constraints after rewriting predicate subquery
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227205054
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
@@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
Batch("Rewrite Subquery", FixedPoint(1),
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
CollapseProject,
RemoveRedundantProject) :: Nil
}
test("Column pruning after rewriting predicate subquery") {
- val relation = LocalRelation('a.int, 'b.int)
- val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
+ withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
--- End diff --
We need to modify this existing test?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227214593
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
@@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
Batch("Rewrite Subquery", FixedPoint(1),
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
CollapseProject,
RemoveRedundantProject) :: Nil
}
test("Column pruning after rewriting predicate subquery") {
- val relation = LocalRelation('a.int, 'b.int)
- val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
+ withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
+ val relation = LocalRelation('a.int, 'b.int)
+ val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
- val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
+ val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
- val optimized = Optimize.execute(query.analyze)
- val correctAnswer = relation
- .select('a)
- .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
- .analyze
+ val optimized = Optimize.execute(query.analyze)
+ val correctAnswer = relation
+ .select('a)
+ .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
+ .analyze
- comparePlans(optimized, correctAnswer)
+ comparePlans(optimized, correctAnswer)
+ }
+ }
+
+ test("Infer filters and push down predicate after rewriting predicate subquery") {
--- End diff --
Need the column pruning in the test title?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #97655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97655/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4162/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #98246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98246/testReport)** for PR 22778 at commit [`db519c3`](https://github.com/apache/spark/commit/db519c3f0241cb13880665eca934e6d6e34e6fd7).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22778
@maropu This is optimizer statistics before and after this patch.
```
=== Metrics of Analyzer/Optimizer Rules before this patch === === Metrics of Analyzer/Optimizer Rules after this patch ===
Total number of runs: 220860 Total number of runs: 221842
Total time: 92.850628879 seconds Total time: 100.810327761 seconds
Rule Effective Time / Total Time Effective Runs / Total Runs Effective Time / Total Time Effective Runs / Total Runs
org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries 10132656768 / 10253620575 47 / 390 12304351112 / 12424750794 47 / 390
org.apache.spark.sql.catalyst.optimizer.ColumnPruning 1880308393 / 8329162616 391 / 2333 2117896745 / 9071585339 391 / 2333
org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution 5092672075 / 5110064107 58 / 890 4887656663 / 4903590901 58 / 890
org.apache.spark.sql.catalyst.optimizer.PruneFilters 61752234 / 4105761812 5 / 1943 70559207 / 4465280263 5 / 1943
org.apache.spark.sql.catalyst.optimizer.ReorderJoin 2721845400 / 3697707596 181 / 1943 3356876470 / 4365735954 181 / 1943
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences 2152781072 / 3114708964 846 / 2296 2014517480 / 3580105409 312 / 780
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions 875336763 / 2522798268 45 / 2296 2054648919 / 3061650250 846 / 2296
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin 1301162184 / 1925078331 811 / 1943 880342476 / 2535797956 45 / 2296
org.apache.spark.sql.catalyst.optimizer.BooleanSimplification 35280989 / 1822909975 9 / 1943 1415726619 / 2070669333 811 / 1943
org.apache.spark.sql.catalyst.analysis.DecimalPrecision 1486561519 / 1791324588 288 / 2296 38775682 / 1883707910 9 / 1943
org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints 1289610205 / 1384744531 278 / 390 1444130367 / 1745543467 288 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability 29636254 / 1355280611 12 / 820 1109816330 / 1619640495 853 / 2333
org.apache.spark.sql.catalyst.optimizer.ConstantFolding 206713838 / 1331971096 194 / 1943 69725969 / 1417183573 42 / 1943
org.apache.spark.sql.catalyst.optimizer.PushDownPredicate 959279687 / 1331761891 820 / 1943 0 / 1404733663 0 / 1943
org.apache.spark.sql.catalyst.optimizer.NullPropagation 65910473 / 1324952383 42 / 1943 195479597 / 1340347970 194 / 1943
org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator 0 / 1268803431 0 / 1943 0 / 1331100388 0 / 1943
org.apache.spark.sql.catalyst.optimizer.OptimizeIn 23201608 / 1258483307 27 / 1943 1105179405 / 1317882379 59 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery 1040041843 / 1251992635 59 / 2296 74349060 / 1291276637 84 / 1943
org.apache.spark.sql.catalyst.optimizer.LikeSimplification 1347508 / 1248925860 1 / 1943 0 / 1289545537 0 / 1943
org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin 45563225 / 1246306320 15 / 1943 0 / 1288186620 0 / 1943
org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals 0 / 1234509354 0 / 1943 23417314 / 1279842595 27 / 1943
org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison 0 / 1224533697 0 / 1943 0 / 1279108156 0 / 1943
org.apache.spark.sql.catalyst.optimizer.SimplifyCasts 75406425 / 1215567094 84 / 1943 1361911 / 1268795854 1 / 1943
org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions 0 / 1196031525 0 / 1943 42577656 / 1223661384 15 / 1943
org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions 0 / 1184848914 0 / 1943 0 / 1200658411 0 / 1943
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences 31233040 / 1184415886 10 / 2296 29312354 / 1162683260 10 / 2296
org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps 0 / 1159058474 0 / 1943 60950940 / 1155752729 47 / 2333
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases 5698821 / 1153787317 10 / 1943 122109389 / 1152651163 155 / 2333
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject 78177701 / 1123087719 47 / 2333 6731330 / 1135242708 10 / 1943
org.apache.spark.sql.catalyst.optimizer.CollapseProject 123874416 / 1101850924 155 / 2333 10567406 / 1130613449 8 / 1943
org.apache.spark.sql.catalyst.optimizer.CombineFilters 449980198 / 1051429309 636 / 1943 30638232 / 1111306240 12 / 820
org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery 9712531 / 1010330915 8 / 1943 29494556 / 1058321219 43 / 2333
org.apache.spark.sql.catalyst.optimizer.CombineUnions 31689453 / 991412499 43 / 2333 0 / 1057978599 0 / 1943
org.apache.spark.sql.catalyst.optimizer.CollapseRepartition 0 / 985591919 0 / 1943 399285885 / 1036327468 111 / 2296
org.apache.spark.sql.catalyst.optimizer.ConstantPropagation 0 / 953077628 0 / 1943 476914099 / 1011633729 636 / 1943
org.apache.spark.sql.catalyst.optimizer.EliminateSorts 0 / 909291634 0 / 1943 0 / 1001606023 0 / 1943
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts 365802139 / 908733912 111 / 2296 0 / 997621178 0 / 1943
org.apache.spark.sql.catalyst.optimizer.CollapseWindow 0 / 897540927 0 / 1943 0 / 927316666 0 / 1943
org.apache.spark.sql.catalyst.optimizer.CombineLimits 0 / 875781956 0 / 1943 0 / 922370179 0 / 1943
org.apache.spark.sql.catalyst.optimizer.LimitPushDown 0 / 868110787 0 / 1943 0 / 917919176 0 / 1943
org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion 26326365 / 865776545 21 / 1943 23992497 / 900311285 21 / 1943
org.apache.spark.sql.catalyst.optimizer.EliminateSerialization 0 / 854607673 0 / 1943 0 / 892746271 0 / 1943
org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion 262665875 / 579658305 62 / 2296 260616344 / 593811716 62 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions 153745818 / 475319649 408 / 2296 241689927 / 455832126 33 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion 235816769 / 455590494 33 / 2296 152047701 / 451570922 408 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings 13635691 / 355369662 11 / 2296 9052332 / 383939387 4 / 2296
org.apache.spark.sql.catalyst.analysis.ResolveTimeZone 146691973 / 323105943 566 / 2296 15974079 / 369836047 11 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion 6787826 / 321386706 4 / 2296 0 / 364936587 0 / 390
org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division 31419357 / 319844433 10 / 2296 147860974 / 331495653 566 / 2296
org.apache.spark.sql.catalyst.optimizer.UpdateNullabilityInAttributeReferences 667532 / 316522643 1 / 390 32499587 / 329200084 10 / 2296
org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation 9098104 / 315856886 5 / 785 635601 / 324689035 1 / 390
org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations 9601865 / 307128084 43 / 2296 0 / 321984844 0 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion 0 / 304912772 0 / 2296 8226740 / 319740326 43 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality 0 / 304196921 0 / 2296 0 / 315499126 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator 0 / 300839906 0 / 2296 0 / 313318332 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder 48154044 / 291379133 24 / 2296 0 / 312265841 0 / 390
org.apache.spark.sql.catalyst.optimizer.DecimalAggregates 108634840 / 289690009 124 / 514 0 / 308258712 0 / 2296
org.apache.spark.sql.execution.python.ExtractPythonUDFs 0 / 278550649 0 / 390 103129068 / 289052907 124 / 514
org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct 0 / 273938538 0 / 2296 0 / 265747445 0 / 2296
org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll 0 / 268165845 0 / 432 10768959 / 260676307 24 / 2296
org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates 74395591 / 239152525 27 / 390 0 / 249739752 0 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion 0 / 236351130 0 / 2296 13236741 / 244925139 5 / 785
org.apache.spark.sql.catalyst.analysis.TimeWindowing 0 / 235225556 0 / 2296 0 / 244460591 0 / 390
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions 0 / 230404877 0 / 390 204172100 / 242598141 222 / 390
org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions 0 / 229768333 0 / 2296 12417039 / 235806609 37 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame 10792824 / 228415803 37 / 2296 0 / 232472018 0 / 2296
org.apache.spark.sql.execution.datasources.FindDataSourceTable 179627028 / 228013811 294 / 2296 184333924 / 228692095 294 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion 0 / 225389128 0 / 2296 59881230 / 226584657 27 / 390
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion 0 / 218608000 0 / 2296 0 / 221887078 0 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion 0 / 218194486 0 / 2296 0 / 215732842 0 / 2296
org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery 173523514 / 214927844 222 / 390 0 / 213744878 0 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion 0 / 214340662 0 / 2296 0 / 213295968 0 / 2296
org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 0 / 199432071 0 / 1943 0 / 194465130 0 / 1943
org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects 0 / 185179070 0 / 390 0 / 178698382 0 / 390
org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery 0 / 176454024 0 / 390 0 / 173558271 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions 39309018 / 176186699 38 / 2296 43715379 / 172272368 38 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed 0 / 170022403 0 / 2296 0 / 165447407 0 / 390
org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate 0 / 159757503 0 / 390 0 / 160391099 0 / 390
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime 0 / 158140055 0 / 390 41644289 / 159215748 12 / 2296
org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases 141871895 / 156820337 296 / 390 0 / 158779735 0 / 390
org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables 0 / 152376408 0 / 2296 0 / 157874334 0 / 390
org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions 0 / 152316462 0 / 390 0 / 157132550 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics 42422112 / 151049951 12 / 2296 140810447 / 156338363 296 / 390
org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase 0 / 137267577 0 / 390 24557398 / 151324074 42 / 432
org.apache.spark.sql.catalyst.optimizer.PullOutPythonUDFInJoinCondition 0 / 133692760 0 / 390 71287846 / 142242298 507 / 1327
org.apache.spark.sql.catalyst.analysis.CleanupAliases 68056569 / 131655681 507 / 1327 0 / 134581169 0 / 390
org.apache.spark.sql.catalyst.optimizer.RemoveRedundantSorts 0 / 130701173 0 / 390 0 / 133089737 0 / 390
org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic 0 / 127233011 0 / 820 0 / 130835073 0 / 390
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases 17107572 / 125147408 53 / 2296 0 / 128705527 0 / 820
org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters 0 / 123463473 0 / 390 0 / 125210046 0 / 432
org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll 0 / 114173237 0 / 432 12476277 / 121836057 24 / 432
org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin 10889547 / 112652867 24 / 432 21381778 / 118895727 53 / 2296
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter 0 / 110008368 0 / 432 0 / 116345587 0 / 432
org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate 21610795 / 109828668 42 / 432 0 / 113808944 0 / 432
org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin 1216445 / 108203801 1 / 432 1720957 / 112284623 1 / 432
org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates 0 / 107287411 0 / 390 59563957 / 106598698 294 / 2306
org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions 2159170 / 104265980 2 / 392 0 / 106179171 0 / 390
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer 0 / 103215140 0 / 2296 0 / 103787927 0 / 390
org.apache.spark.sql.catalyst.analysis.EliminateView 0 / 102296786 0 / 390 0 / 103305027 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations 50917277 / 97159389 294 / 2306 4385370 / 102288812 2 / 392
org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions 0 / 92841832 0 / 392 0 / 97691358 0 / 392
org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates 3915170 / 92591154 62 / 2296 4051673 / 94971539 62 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions 0 / 89117456 0 / 830 0 / 93198423 0 / 390
org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate 0 / 88821013 0 / 390 0 / 90764806 0 / 830
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast 0 / 84561708 0 / 2296 19857036 / 89051825 24 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance 0 / 84221700 0 / 2296 0 / 82792409 0 / 2296
org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes 21598496 / 83356934 24 / 2296 0 / 81606995 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF 0 / 71959898 0 / 820 2297618 / 79871191 8 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy 2555449 / 69189122 8 / 2296 0 / 70548261 0 / 820
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot 0 / 63485517 0 / 2296 45895772 / 59588549 24 / 820
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy 0 / 58879583 0 / 2296 0 / 57888686 0 / 2296
org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 80429 / 57856725 76 / 11912 0 / 54729566 0 / 2296
org.apache.spark.sql.execution.datasources.DataSourceAnalysis 41813139 / 55552144 24 / 820 0 / 54556018 0 / 2296
org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions 0 / 53367328 0 / 2306 0 / 50850649 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate 0 / 51405189 0 / 2296 72764 / 50825747 76 / 12114
org.apache.spark.sql.catalyst.analysis.ResolveInlineTables 0 / 51249460 0 / 2296 0 / 50057052 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin 0 / 50353332 0 / 2296 0 / 49708235 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases 0 / 49502239 0 / 2296 0 / 48841848 0 / 2296
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation 0 / 49417950 0 / 2296 0 / 48215992 0 / 2296
org.apache.spark.sql.execution.datasources.ResolveSQLOnFile 0 / 49374943 0 / 2296 0 / 47911673 0 / 2306
org.apache.spark.sql.execution.datasources.PreprocessTableCreation 0 / 30942645 0 / 820 0 / 33275799 0 / 820
org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals 4537227 / 30392738 8 / 890 4410251 / 32033067 8 / 890
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints 0 / 27361418 0 / 830 0 / 28707886 0 / 830
org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts 0 / 22963407 0 / 390 0 / 24315925 0 / 390
org.apache.spark.sql.catalyst.analysis.EliminateUnions 0 / 22593930 0 / 890 0 / 20247865 0 / 820
org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution 0 / 22089269 0 / 890 0 / 19302887 0 / 890
org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences 0 / 20769130 0 / 820 0 / 19285784 0 / 890
org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints 0 / 20604280 0 / 830 0 / 16660442 0 / 830
org.apache.spark.sql.catalyst.analysis.AliasViewChild 0 / 15787426 0 / 820 0 / 15849217 0 / 820
org.apache.spark.sql.execution.datasources.PreprocessTableInsertion 0 / 14039160 0 / 820 0 / 13514297 0 / 820
org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints 0 / 12451609 0 / 830 0 / 12090842 0 / 830
org.apache.spark.sql.catalyst.optimizer.CombineConcats 0 / 9915780 0 / 1943 0 / 9786602 0 / 1943
org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder 0 / 4431280 0 / 390 0 / 4314429 0 / 390
org.apache.spark.sql.catalyst.optimizer.EliminateDistinct 0 / 4138833 0 / 390 0 / 3707721 0 / 390
org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaPruning 0 / 3070796 0 / 390 0 / 3494547 0 / 390
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227236021
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
@@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
Batch("Rewrite Subquery", FixedPoint(1),
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
CollapseProject,
RemoveRedundantProject) :: Nil
}
test("Column pruning after rewriting predicate subquery") {
- val relation = LocalRelation('a.int, 'b.int)
- val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
+ withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
+ val relation = LocalRelation('a.int, 'b.int)
+ val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
- val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
+ val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
- val optimized = Optimize.execute(query.analyze)
- val correctAnswer = relation
- .select('a)
- .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
- .analyze
+ val optimized = Optimize.execute(query.analyze)
+ val correctAnswer = relation
+ .select('a)
+ .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
+ .analyze
- comparePlans(optimized, correctAnswer)
+ comparePlans(optimized, correctAnswer)
+ }
+ }
+
+ test("Infer filters and push down predicate after rewriting predicate subquery") {
--- End diff --
How about refactor these test to:
```scala
val relation = LocalRelation('a.int, 'b.int)
val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
test("Column pruning") {
withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
val optimized = Optimize.execute(query.analyze)
val correctAnswer = relation
.select('a)
.join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
.analyze
comparePlans(optimized, correctAnswer)
}
}
test("Column pruning, infer filters and push down predicate") {
withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "true") {
val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
val optimized = Optimize.execute(query.analyze)
val correctAnswer = relation
.where(IsNotNull('a)).select('a)
.join(relInSubquery.where(IsNotNull('x)).select('x), LeftSemi, Some('a === 'x))
.analyze
comparePlans(optimized, correctAnswer)
}
}
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227215135
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
@@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
Batch("Rewrite Subquery", FixedPoint(1),
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
CollapseProject,
RemoveRedundantProject) :: Nil
}
test("Column pruning after rewriting predicate subquery") {
- val relation = LocalRelation('a.int, 'b.int)
- val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
+ withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
+ val relation = LocalRelation('a.int, 'b.int)
+ val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
- val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
+ val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
- val optimized = Optimize.execute(query.analyze)
- val correctAnswer = relation
- .select('a)
- .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
- .analyze
+ val optimized = Optimize.execute(query.analyze)
+ val correctAnswer = relation
+ .select('a)
+ .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
+ .analyze
- comparePlans(optimized, correctAnswer)
+ comparePlans(optimized, correctAnswer)
+ }
+ }
+
+ test("Infer filters and push down predicate after rewriting predicate subquery") {
--- End diff --
How about making the test title simple, then leaving comments about what's tested clearly here?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97655/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98246/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r229362181
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
// "Extract PythonUDF From JoinCondition".
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) :+
- Batch("RewriteSubquery", Once,
+ Batch("Rewrite Subquery", Once,
--- End diff --
@gatorsmile Sure Sean.. Let me give it a try.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #98294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98294/testReport)** for PR 22778 at commit [`80bf621`](https://github.com/apache/spark/commit/80bf621f5fc4d2806d116b3965374fa9d0947311).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4649/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r229356678
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
// "Extract PythonUDF From JoinCondition".
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) :+
- Batch("RewriteSubquery", Once,
+ Batch("Rewrite Subquery", Once,
--- End diff --
@gatorsmile I think @dilipbiswal's suggestion is the right way to go.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227208719
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
@@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
Batch("Rewrite Subquery", FixedPoint(1),
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
CollapseProject,
RemoveRedundantProject) :: Nil
}
test("Column pruning after rewriting predicate subquery") {
- val relation = LocalRelation('a.int, 'b.int)
- val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
+ withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
--- End diff --
Yes, `spark.sql.constraintPropagation.enabled=false` to test `ColumnPruning`.
`spark.sql.constraintPropagation.enabled=true` to test `ColumnPruning`, `InferFiltersFromConstraints` and `PushDownPredicate`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227261253
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -171,9 +171,11 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
// "Extract PythonUDF From JoinCondition".
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) :+
- Batch("RewriteSubquery", Once,
+ Batch("Rewrite Subquery", Once,
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
--- End diff --
looks good, cc @gatorsmile @maryannxue
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22778
Can you put the concrete example of the missing case you described in the PR description?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22778
We also need to add the `CombineFilters` based on https://github.com/apache/spark/pull/22879.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22778
Also, to make sure no performance regression in the optimizer, can you check optimizer statistics in TPCDS by running `TPCDSQuerySuite`, too?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22778
Let us hold this PR and try to fix https://github.com/apache/spark/pull/17520 instead.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #97655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97655/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r227214404
--- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
@@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
Batch("Rewrite Subquery", FixedPoint(1),
RewritePredicateSubquery,
ColumnPruning,
+ InferFiltersFromConstraints,
+ PushDownPredicate,
CollapseProject,
RemoveRedundantProject) :: Nil
}
test("Column pruning after rewriting predicate subquery") {
- val relation = LocalRelation('a.int, 'b.int)
- val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
+ withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
--- End diff --
Ah, I see. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r229178084
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
// "Extract PythonUDF From JoinCondition".
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) :+
- Batch("RewriteSubquery", Once,
+ Batch("Rewrite Subquery", Once,
--- End diff --
I do not have a good answer for this PR. Ideally, we should run the whole batch `operatorOptimizationBatch`. However, running the rules could be very time consuming. I would suggest to add a new parameter for introducing the time bound limit for each batch.
cc @maryannxue WDYT?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97629/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4139/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97669/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98294/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22778
cc @gatorsmile @cloud-fan @maropu
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #98294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98294/testReport)** for PR 22778 at commit [`80bf621`](https://github.com/apache/spark/commit/80bf621f5fc4d2806d116b3965374fa9d0947311).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r229181821
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
// "Extract PythonUDF From JoinCondition".
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) :+
- Batch("RewriteSubquery", Once,
+ Batch("Rewrite Subquery", Once,
--- End diff --
@gatorsmile Do you think its a good time to revisit Natt's PR to convert subquery expressions to Joins early in the optimization process ? Perhaps then we can take advantage of all the subsequent rules firing after the subquery rewrite ?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22778#discussion_r229361802
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
// "Extract PythonUDF From JoinCondition".
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) :+
- Batch("RewriteSubquery", Once,
+ Batch("Rewrite Subquery", Once,
--- End diff --
Sure. That sounds also good to me. @dilipbiswal Could you take the PR https://github.com/apache/spark/pull/17520 over?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #97669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97669/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22778
retest this please
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4174/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #97629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97629/testReport)** for PR 22778 at commit [`c8d1b91`](https://github.com/apache/spark/commit/c8d1b91b93e7ad05ca0bd17984fad1c30062d504).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #97629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97629/testReport)** for PR 22778 at commit [`c8d1b91`](https://github.com/apache/spark/commit/c8d1b91b93e7ad05ca0bd17984fad1c30062d504).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22778
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4617/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #98246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98246/testReport)** for PR 22778 at commit [`db519c3`](https://github.com/apache/spark/commit/db519c3f0241cb13880665eca934e6d6e34e6fd7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22778
**[Test build #97669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97669/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org