You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/10/20 01:43:15 UTC

[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22778

    [SPARK-25784][SQL] Infer filters from constraints after rewriting predicate subquery

    ## What changes were proposed in this pull request?
    
    Infer filters from constraints after rewriting predicate subquery.
    
    ## How was this patch tested?
    unit tests and benchmark tests
    ```scala
    withTempView("t1", "t2") {
      withTempDir { dir =>
        spark.range(3000000)
          .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as c2", "id as c3")
          .coalesce(1)
          .orderBy("c2")
          .write
          .mode("overwrite")
          .option("parquet.block.size", 10485760)
          .parquet(dir.getCanonicalPath)
    
        spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
        spark.read.parquet(dir.getCanonicalPath).createTempView("t2")
    
        Seq("c1", "c2", "c3").foreach { column =>
          val benchmark = new Benchmark(s"join key $column", 10)
          Seq(false, true).foreach { inferFilters =>
            benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { _ =>
              withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> inferFilters.toString) {
                sql(s"select t1.* from t1 where t1.$column in (select $column from t2)").count()
              }
            }
          }
          benchmark.run()
        }
      }
    }
    ```
    
    ```
    ava HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    join key c1:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Is infer filters false                        2005 / 2163          0.0   200481431.0       1.0X
    Is infer filters true                          190 /  207          0.0    18962935.7      10.6X
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    join key c2:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Is infer filters false                        2368 / 2498          0.0   236803743.1       1.0X
    Is infer filters true                         1234 / 1268          0.0   123443912.3       1.9X
    
    Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
    Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
    join key c3:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    Is infer filters false                        2754 / 2907          0.0   275376009.7       1.0X
    Is infer filters true                         2237 / 2255          0.0   223739457.8       1.2X
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25784

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22778
    
----
commit c8d1b91b93e7ad05ca0bd17984fad1c30062d504
Author: Yuming Wang <yu...@...>
Date:   2018-10-20T01:39:51Z

    Infer filters from constraints after rewriting predicate subquery

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227205054
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
           Batch("Rewrite Subquery", FixedPoint(1),
             RewritePredicateSubquery,
             ColumnPruning,
    +        InferFiltersFromConstraints,
    +        PushDownPredicate,
             CollapseProject,
             RemoveRedundantProject) :: Nil
       }
     
       test("Column pruning after rewriting predicate subquery") {
    -    val relation = LocalRelation('a.int, 'b.int)
    -    val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    +    withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
    --- End diff --
    
    We need to modify this existing test?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227214593
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
           Batch("Rewrite Subquery", FixedPoint(1),
             RewritePredicateSubquery,
             ColumnPruning,
    +        InferFiltersFromConstraints,
    +        PushDownPredicate,
             CollapseProject,
             RemoveRedundantProject) :: Nil
       }
     
       test("Column pruning after rewriting predicate subquery") {
    -    val relation = LocalRelation('a.int, 'b.int)
    -    val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    +    withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
    +      val relation = LocalRelation('a.int, 'b.int)
    +      val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
     
    -    val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
    +      val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = relation
    -      .select('a)
    -      .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
    -      .analyze
    +      val optimized = Optimize.execute(query.analyze)
    +      val correctAnswer = relation
    +        .select('a)
    +        .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
    +        .analyze
     
    -    comparePlans(optimized, correctAnswer)
    +      comparePlans(optimized, correctAnswer)
    +    }
    +  }
    +
    +  test("Infer filters and push down predicate after rewriting predicate subquery") {
    --- End diff --
    
    Need the column pruning in the test title?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #97655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97655/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4162/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #98246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98246/testReport)** for PR 22778 at commit [`db519c3`](https://github.com/apache/spark/commit/db519c3f0241cb13880665eca934e6d6e34e6fd7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    @maropu This is optimizer statistics before and after this patch.
    ```
    === Metrics of Analyzer/Optimizer Rules before this patch ===                                                                                                                                                 === Metrics of Analyzer/Optimizer Rules after this patch ===
    Total number of runs: 220860                                                                                                                                                                                  Total number of runs: 221842
    Total time: 92.850628879 seconds                                                                                                                                                                              Total time: 100.810327761 seconds
    
    Rule                                                                                               Effective Time / Total Time                     Effective Runs / Total Runs                                Effective Time / Total Time                     Effective Runs / Total Runs
    
    org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries                               10132656768 / 10253620575                       47 / 390                                                   12304351112 / 12424750794                       47 / 390
    org.apache.spark.sql.catalyst.optimizer.ColumnPruning                                              1880308393 / 8329162616                         391 / 2333                                                 2117896745 / 9071585339                         391 / 2333
    org.apache.spark.sql.catalyst.analysis.Analyzer$CTESubstitution                                    5092672075 / 5110064107                         58 / 890                                                   4887656663 / 4903590901                         58 / 890
    org.apache.spark.sql.catalyst.optimizer.PruneFilters                                               61752234 / 4105761812                           5 / 1943                                                   70559207 / 4465280263                           5 / 1943
    org.apache.spark.sql.catalyst.optimizer.ReorderJoin                                                2721845400 / 3697707596                         181 / 1943                                                 3356876470 / 4365735954                         181 / 1943
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences                                  2152781072 / 3114708964                         846 / 2296                                                 2014517480 / 3580105409                         312 / 780
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggregateFunctions                          875336763 / 2522798268                          45 / 2296                                                  2054648919 / 3061650250                         846 / 2296
    org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin                                   1301162184 / 1925078331                         811 / 1943                                                 880342476 / 2535797956                          45 / 2296
    org.apache.spark.sql.catalyst.optimizer.BooleanSimplification                                      35280989 / 1822909975                           9 / 1943                                                   1415726619 / 2070669333                         811 / 1943
    org.apache.spark.sql.catalyst.analysis.DecimalPrecision                                            1486561519 / 1791324588                         288 / 2296                                                 38775682 / 1883707910                           9 / 1943
    org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints                                1289610205 / 1384744531                         278 / 390                                                  1444130367 / 1745543467                         288 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$FixNullability                                     29636254 / 1355280611                           12 / 820                                                   1109816330 / 1619640495                         853 / 2333
    org.apache.spark.sql.catalyst.optimizer.ConstantFolding                                            206713838 / 1331971096                          194 / 1943                                                 69725969 / 1417183573                           42 / 1943
    org.apache.spark.sql.catalyst.optimizer.PushDownPredicate                                          959279687 / 1331761891                          820 / 1943                                                 0 / 1404733663                                  0 / 1943
    org.apache.spark.sql.catalyst.optimizer.NullPropagation                                            65910473 / 1324952383                           42 / 1943                                                  195479597 / 1340347970                          194 / 1943
    org.apache.spark.sql.catalyst.optimizer.ReorderAssociativeOperator                                 0 / 1268803431                                  0 / 1943                                                   0 / 1331100388                                  0 / 1943
    org.apache.spark.sql.catalyst.optimizer.OptimizeIn                                                 23201608 / 1258483307                           27 / 1943                                                  1105179405 / 1317882379                         59 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubquery                                    1040041843 / 1251992635                         59 / 2296                                                  74349060 / 1291276637                           84 / 1943
    org.apache.spark.sql.catalyst.optimizer.LikeSimplification                                         1347508 / 1248925860                            1 / 1943                                                   0 / 1289545537                                  0 / 1943
    org.apache.spark.sql.catalyst.optimizer.EliminateOuterJoin                                         45563225 / 1246306320                           15 / 1943                                                  0 / 1288186620                                  0 / 1943
    org.apache.spark.sql.catalyst.optimizer.SimplifyConditionals                                       0 / 1234509354                                  0 / 1943                                                   23417314 / 1279842595                           27 / 1943
    org.apache.spark.sql.catalyst.optimizer.SimplifyBinaryComparison                                   0 / 1224533697                                  0 / 1943                                                   0 / 1279108156                                  0 / 1943
    org.apache.spark.sql.catalyst.optimizer.SimplifyCasts                                              75406425 / 1215567094                           84 / 1943                                                  1361911 / 1268795854                            1 / 1943
    org.apache.spark.sql.catalyst.optimizer.SimplifyCaseConversionExpressions                          0 / 1196031525                                  0 / 1943                                                   42577656 / 1223661384                           15 / 1943
    org.apache.spark.sql.catalyst.optimizer.RemoveDispensableExpressions                               0 / 1184848914                                  0 / 1943                                                   0 / 1200658411                                  0 / 1943
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveMissingReferences                           31233040 / 1184415886                           10 / 2296                                                  29312354 / 1162683260                           10 / 2296
    org.apache.spark.sql.catalyst.optimizer.SimplifyExtractValueOps                                    0 / 1159058474                                  0 / 1943                                                   60950940 / 1155752729                           47 / 2333
    org.apache.spark.sql.catalyst.optimizer.RemoveRedundantAliases                                     5698821 / 1153787317                            10 / 1943                                                  122109389 / 1152651163                          155 / 2333
    org.apache.spark.sql.catalyst.optimizer.RemoveRedundantProject                                     78177701 / 1123087719                           47 / 2333                                                  6731330 / 1135242708                            10 / 1943
    org.apache.spark.sql.catalyst.optimizer.CollapseProject                                            123874416 / 1101850924                          155 / 2333                                                 10567406 / 1130613449                           8 / 1943
    org.apache.spark.sql.catalyst.optimizer.CombineFilters                                             449980198 / 1051429309                          636 / 1943                                                 30638232 / 1111306240                           12 / 820
    org.apache.spark.sql.catalyst.optimizer.RewriteCorrelatedScalarSubquery                            9712531 / 1010330915                            8 / 1943                                                   29494556 / 1058321219                           43 / 2333
    org.apache.spark.sql.catalyst.optimizer.CombineUnions                                              31689453 / 991412499                            43 / 2333                                                  0 / 1057978599                                  0 / 1943
    org.apache.spark.sql.catalyst.optimizer.CollapseRepartition                                        0 / 985591919                                   0 / 1943                                                   399285885 / 1036327468                          111 / 2296
    org.apache.spark.sql.catalyst.optimizer.ConstantPropagation                                        0 / 953077628                                   0 / 1943                                                   476914099 / 1011633729                          636 / 1943
    org.apache.spark.sql.catalyst.optimizer.EliminateSorts                                             0 / 909291634                                   0 / 1943                                                   0 / 1001606023                                  0 / 1943
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$ImplicitTypeCasts                              365802139 / 908733912                           111 / 2296                                                 0 / 997621178                                   0 / 1943
    org.apache.spark.sql.catalyst.optimizer.CollapseWindow                                             0 / 897540927                                   0 / 1943                                                   0 / 927316666                                   0 / 1943
    org.apache.spark.sql.catalyst.optimizer.CombineLimits                                              0 / 875781956                                   0 / 1943                                                   0 / 922370179                                   0 / 1943
    org.apache.spark.sql.catalyst.optimizer.LimitPushDown                                              0 / 868110787                                   0 / 1943                                                   0 / 917919176                                   0 / 1943
    org.apache.spark.sql.catalyst.optimizer.PushProjectionThroughUnion                                 26326365 / 865776545                            21 / 1943                                                  23992497 / 900311285                            21 / 1943
    org.apache.spark.sql.catalyst.optimizer.EliminateSerialization                                     0 / 854607673                                   0 / 1943                                                   0 / 892746271                                   0 / 1943
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$FunctionArgumentConversion                     262665875 / 579658305                           62 / 2296                                                  260616344 / 593811716                           62 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions                                   153745818 / 475319649                           408 / 2296                                                 241689927 / 455832126                           33 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$CaseWhenCoercion                               235816769 / 455590494                           33 / 2296                                                  152047701 / 451570922                           408 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$PromoteStrings                                 13635691 / 355369662                            11 / 2296                                                  9052332 / 383939387                             4 / 2296
    org.apache.spark.sql.catalyst.analysis.ResolveTimeZone                                             146691973 / 323105943                           566 / 2296                                                 15974079 / 369836047                            11 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$InConversion                                   6787826 / 321386706                             4 / 2296                                                   0 / 364936587                                   0 / 390
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$Division                                       31419357 / 319844433                            10 / 2296                                                  147860974 / 331495653                           566 / 2296
    org.apache.spark.sql.catalyst.optimizer.UpdateNullabilityInAttributeReferences                     667532 / 316522643                              1 / 390                                                    32499587 / 329200084                            10 / 2296
    org.apache.spark.sql.catalyst.optimizer.PropagateEmptyRelation                                     9098104 / 315856886                             5 / 785                                                    635601 / 324689035                              1 / 390
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$DateTimeOperations                             9601865 / 307128084                             43 / 2296                                                  0 / 321984844                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$IfCoercion                                     0 / 304912772                                   0 / 2296                                                   8226740 / 319740326                             43 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$BooleanEquality                                0 / 304196921                                   0 / 2296                                                   0 / 315499126                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator                                   0 / 300839906                                   0 / 2296                                                   0 / 313318332                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowOrder                                 48154044 / 291379133                            24 / 2296                                                  0 / 312265841                                   0 / 390
    org.apache.spark.sql.catalyst.optimizer.DecimalAggregates                                          108634840 / 289690009                           124 / 514                                                  0 / 308258712                                   0 / 2296
    org.apache.spark.sql.execution.python.ExtractPythonUDFs                                            0 / 278550649                                   0 / 390                                                    103129068 / 289052907                           124 / 514
    org.apache.spark.sql.catalyst.analysis.ResolveCreateNamedStruct                                    0 / 273938538                                   0 / 2296                                                   0 / 265747445                                   0 / 2296
    org.apache.spark.sql.catalyst.optimizer.RewriteIntersectAll                                        0 / 268165845                                   0 / 432                                                    10768959 / 260676307                            24 / 2296
    org.apache.spark.sql.catalyst.optimizer.PullupCorrelatedPredicates                                 74395591 / 239152525                            27 / 390                                                   0 / 249739752                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$ConcatCoercion                                 0 / 236351130                                   0 / 2296                                                   13236741 / 244925139                            5 / 785
    org.apache.spark.sql.catalyst.analysis.TimeWindowing                                               0 / 235225556                                   0 / 2296                                                   0 / 244460591                                   0 / 390
    org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions                               0 / 230404877                                   0 / 390                                                    204172100 / 242598141                           222 / 390
    org.apache.spark.sql.catalyst.analysis.ResolveHigherOrderFunctions                                 0 / 229768333                                   0 / 2296                                                   12417039 / 235806609                            37 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame                                 10792824 / 228415803                            37 / 2296                                                  0 / 232472018                                   0 / 2296
    org.apache.spark.sql.execution.datasources.FindDataSourceTable                                     179627028 / 228013811                           294 / 2296                                                 184333924 / 228692095                           294 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$EltCoercion                                    0 / 225389128                                   0 / 2296                                                   59881230 / 226584657                            27 / 390
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$WindowFrameCoercion                            0 / 218608000                                   0 / 2296                                                   0 / 221887078                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$StackCoercion                                  0 / 218194486                                   0 / 2296                                                   0 / 215732842                                   0 / 2296
    org.apache.spark.sql.catalyst.optimizer.RewritePredicateSubquery                                   173523514 / 214927844                           222 / 390                                                  0 / 213744878                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$MapZipWithCoercion                             0 / 214340662                                   0 / 2296                                                   0 / 213295968                                   0 / 2296
    org.apache.spark.sql.catalyst.optimizer.FoldablePropagation                                        0 / 199432071                                   0 / 1943                                                   0 / 194465130                                   0 / 1943
    org.apache.spark.sql.catalyst.optimizer.EliminateMapObjects                                        0 / 185179070                                   0 / 390                                                    0 / 178698382                                   0 / 390
    org.apache.spark.sql.execution.OptimizeMetadataOnlyQuery                                           0 / 176454024                                   0 / 390                                                    0 / 173558271                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractWindowExpressions                           39309018 / 176186699                            38 / 2296                                                  43715379 / 172272368                            38 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRandomSeed                                  0 / 170022403                                   0 / 2296                                                   0 / 165447407                                   0 / 390
    org.apache.spark.sql.execution.python.ExtractPythonUDFFromAggregate                                0 / 159757503                                   0 / 390                                                    0 / 160391099                                   0 / 390
    org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTime                                         0 / 158140055                                   0 / 390                                                    41644289 / 159215748                            12 / 2296
    org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases                                    141871895 / 156820337                           296 / 390                                                  0 / 158779735                                   0 / 390
    org.apache.spark.sql.catalyst.analysis.ResolveLambdaVariables                                      0 / 152376408                                   0 / 2296                                                   0 / 157874334                                   0 / 390
    org.apache.spark.sql.catalyst.optimizer.ReplaceExpressions                                         0 / 152316462                                   0 / 390                                                    0 / 157132550                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics                           42422112 / 151049951                            12 / 2296                                                  140810447 / 156338363                           296 / 390
    org.apache.spark.sql.catalyst.optimizer.GetCurrentDatabase                                         0 / 137267577                                   0 / 390                                                    24557398 / 151324074                            42 / 432
    org.apache.spark.sql.catalyst.optimizer.PullOutPythonUDFInJoinCondition                            0 / 133692760                                   0 / 390                                                    71287846 / 142242298                            507 / 1327
    org.apache.spark.sql.catalyst.analysis.CleanupAliases                                              68056569 / 131655681                            507 / 1327                                                 0 / 134581169                                   0 / 390
    org.apache.spark.sql.catalyst.optimizer.RemoveRedundantSorts                                       0 / 130701173                                   0 / 390                                                    0 / 133089737                                   0 / 390
    org.apache.spark.sql.catalyst.analysis.Analyzer$PullOutNondeterministic                            0 / 127233011                                   0 / 820                                                    0 / 130835073                                   0 / 390
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAliases                                     17107572 / 125147408                            53 / 2296                                                  0 / 128705527                                   0 / 820
    org.apache.spark.sql.catalyst.optimizer.CombineTypedFilters                                        0 / 123463473                                   0 / 390                                                    0 / 125210046                                   0 / 432
    org.apache.spark.sql.catalyst.optimizer.RewriteExceptAll                                           0 / 114173237                                   0 / 432                                                    12476277 / 121836057                            24 / 432
    org.apache.spark.sql.catalyst.optimizer.ReplaceIntersectWithSemiJoin                               10889547 / 112652867                            24 / 432                                                   21381778 / 118895727                            53 / 2296
    org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithFilter                                    0 / 110008368                                   0 / 432                                                    0 / 116345587                                   0 / 432
    org.apache.spark.sql.catalyst.optimizer.ReplaceDistinctWithAggregate                               21610795 / 109828668                            42 / 432                                                   0 / 113808944                                   0 / 432
    org.apache.spark.sql.catalyst.optimizer.ReplaceExceptWithAntiJoin                                  1216445 / 108203801                             1 / 432                                                    1720957 / 112284623                             1 / 432
    org.apache.spark.sql.catalyst.optimizer.RewriteDistinctAggregates                                  0 / 107287411                                   0 / 390                                                    59563957 / 106598698                            294 / 2306
    org.apache.spark.sql.catalyst.optimizer.RemoveRepetitionFromGroupExpressions                       2159170 / 104265980                             2 / 392                                                    0 / 106179171                                   0 / 390
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer                                0 / 103215140                                   0 / 2296                                                   0 / 103787927                                   0 / 390
    org.apache.spark.sql.catalyst.analysis.EliminateView                                               0 / 102296786                                   0 / 390                                                    0 / 103305027                                   0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations                                   50917277 / 97159389                             294 / 2306                                                 4385370 / 102288812                             2 / 392
    org.apache.spark.sql.catalyst.optimizer.RemoveLiteralFromGroupExpressions                          0 / 92841832                                    0 / 392                                                    0 / 97691358                                    0 / 392
    org.apache.spark.sql.catalyst.analysis.Analyzer$GlobalAggregates                                   3915170 / 92591154                              62 / 2296                                                  4051673 / 94971539                              62 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$LookupFunctions                                    0 / 89117456                                    0 / 830                                                    0 / 93198423                                    0 / 390
    org.apache.spark.sql.catalyst.optimizer.ReplaceDeduplicateWithAggregate                            0 / 88821013                                    0 / 390                                                    0 / 90764806                                    0 / 830
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast                                      0 / 84561708                                    0 / 2296                                                   19857036 / 89051825                             24 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNewInstance                                 0 / 84221700                                    0 / 2296                                                   0 / 82792409                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.TypeCoercion$WidenSetOperationTypes                         21598496 / 83356934                             24 / 2296                                                  0 / 81606995                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$HandleNullInputsForUDF                             0 / 71959898                                    0 / 820                                                    2297618 / 79871191                              8 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOrdinalInOrderByAndGroupBy                  2555449 / 69189122                              8 / 2296                                                   0 / 70548261                                    0 / 820
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolvePivot                                       0 / 63485517                                    0 / 2296                                                   45895772 / 59588549                             24 / 820
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveAggAliasInGroupBy                           0 / 58879583                                    0 / 2296                                                   0 / 57888686                                    0 / 2296
    org.apache.spark.sql.catalyst.expressions.codegen.package$ExpressionCanonicalizer$CleanExpressions 80429 / 57856725                                76 / 11912                                                 0 / 54729566                                    0 / 2296
    org.apache.spark.sql.execution.datasources.DataSourceAnalysis                                      41813139 / 55552144                             24 / 820                                                   0 / 54556018                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.ResolveTableValuedFunctions                                 0 / 53367328                                    0 / 2306                                                   0 / 50850649                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGenerate                                    0 / 51405189                                    0 / 2296                                                   72764 / 50825747                                76 / 12114
    org.apache.spark.sql.catalyst.analysis.ResolveInlineTables                                         0 / 51249460                                    0 / 2296                                                   0 / 50057052                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveNaturalAndUsingJoin                         0 / 50353332                                    0 / 2296                                                   0 / 49708235                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveSubqueryColumnAliases                       0 / 49502239                                    0 / 2296                                                   0 / 48841848                                    0 / 2296
    org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveOutputRelation                              0 / 49417950                                    0 / 2296                                                   0 / 48215992                                    0 / 2296
    org.apache.spark.sql.execution.datasources.ResolveSQLOnFile                                        0 / 49374943                                    0 / 2296                                                   0 / 47911673                                    0 / 2306
    org.apache.spark.sql.execution.datasources.PreprocessTableCreation                                 0 / 30942645                                    0 / 820                                                    0 / 33275799                                    0 / 820
    org.apache.spark.sql.catalyst.analysis.SubstituteUnresolvedOrdinals                                4537227 / 30392738                              8 / 890                                                    4410251 / 32033067                              8 / 890
    org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveBroadcastHints                          0 / 27361418                                    0 / 830                                                    0 / 28707886                                    0 / 830
    org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts                                     0 / 22963407                                    0 / 390                                                    0 / 24315925                                    0 / 390
    org.apache.spark.sql.catalyst.analysis.EliminateUnions                                             0 / 22593930                                    0 / 890                                                    0 / 20247865                                    0 / 820
    org.apache.spark.sql.catalyst.analysis.Analyzer$WindowsSubstitution                                0 / 22089269                                    0 / 890                                                    0 / 19302887                                    0 / 890
    org.apache.spark.sql.catalyst.analysis.UpdateOuterReferences                                       0 / 20769130                                    0 / 820                                                    0 / 19285784                                    0 / 890
    org.apache.spark.sql.catalyst.analysis.ResolveHints$ResolveCoalesceHints                           0 / 20604280                                    0 / 830                                                    0 / 16660442                                    0 / 830
    org.apache.spark.sql.catalyst.analysis.AliasViewChild                                              0 / 15787426                                    0 / 820                                                    0 / 15849217                                    0 / 820
    org.apache.spark.sql.execution.datasources.PreprocessTableInsertion                                0 / 14039160                                    0 / 820                                                    0 / 13514297                                    0 / 820
    org.apache.spark.sql.catalyst.analysis.ResolveHints$RemoveAllHints                                 0 / 12451609                                    0 / 830                                                    0 / 12090842                                    0 / 830
    org.apache.spark.sql.catalyst.optimizer.CombineConcats                                             0 / 9915780                                     0 / 1943                                                   0 / 9786602                                     0 / 1943
    org.apache.spark.sql.catalyst.optimizer.CostBasedJoinReorder                                       0 / 4431280                                     0 / 390                                                    0 / 4314429                                     0 / 390
    org.apache.spark.sql.catalyst.optimizer.EliminateDistinct                                          0 / 4138833                                     0 / 390                                                    0 / 3707721                                     0 / 390
    org.apache.spark.sql.execution.datasources.parquet.ParquetSchemaPruning                            0 / 3070796                                     0 / 390                                                    0 / 3494547                                     0 / 390
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227236021
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
           Batch("Rewrite Subquery", FixedPoint(1),
             RewritePredicateSubquery,
             ColumnPruning,
    +        InferFiltersFromConstraints,
    +        PushDownPredicate,
             CollapseProject,
             RemoveRedundantProject) :: Nil
       }
     
       test("Column pruning after rewriting predicate subquery") {
    -    val relation = LocalRelation('a.int, 'b.int)
    -    val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    +    withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
    +      val relation = LocalRelation('a.int, 'b.int)
    +      val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
     
    -    val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
    +      val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = relation
    -      .select('a)
    -      .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
    -      .analyze
    +      val optimized = Optimize.execute(query.analyze)
    +      val correctAnswer = relation
    +        .select('a)
    +        .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
    +        .analyze
     
    -    comparePlans(optimized, correctAnswer)
    +      comparePlans(optimized, correctAnswer)
    +    }
    +  }
    +
    +  test("Infer filters and push down predicate after rewriting predicate subquery") {
    --- End diff --
    
    How about refactor these test to:
    ```scala
      val relation = LocalRelation('a.int, 'b.int)
      val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    
      test("Column pruning") {
        withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
          val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
    
          val optimized = Optimize.execute(query.analyze)
          val correctAnswer = relation
            .select('a)
            .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
            .analyze
    
          comparePlans(optimized, correctAnswer)
        }
      }
    
      test("Column pruning, infer filters and push down predicate") {
        withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "true") {
          val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
    
          val optimized = Optimize.execute(query.analyze)
          val correctAnswer = relation
            .where(IsNotNull('a)).select('a)
            .join(relInSubquery.where(IsNotNull('x)).select('x), LeftSemi, Some('a === 'x))
            .analyze
    
          comparePlans(optimized, correctAnswer)
        }
      }
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227215135
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
           Batch("Rewrite Subquery", FixedPoint(1),
             RewritePredicateSubquery,
             ColumnPruning,
    +        InferFiltersFromConstraints,
    +        PushDownPredicate,
             CollapseProject,
             RemoveRedundantProject) :: Nil
       }
     
       test("Column pruning after rewriting predicate subquery") {
    -    val relation = LocalRelation('a.int, 'b.int)
    -    val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    +    withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
    +      val relation = LocalRelation('a.int, 'b.int)
    +      val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
     
    -    val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
    +      val query = relation.where('a.in(ListQuery(relInSubquery.select('x)))).select('a)
     
    -    val optimized = Optimize.execute(query.analyze)
    -    val correctAnswer = relation
    -      .select('a)
    -      .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
    -      .analyze
    +      val optimized = Optimize.execute(query.analyze)
    +      val correctAnswer = relation
    +        .select('a)
    +        .join(relInSubquery.select('x), LeftSemi, Some('a === 'x))
    +        .analyze
     
    -    comparePlans(optimized, correctAnswer)
    +      comparePlans(optimized, correctAnswer)
    +    }
    +  }
    +
    +  test("Infer filters and push down predicate after rewriting predicate subquery") {
    --- End diff --
    
    How about making the test title simple, then leaving comments about what's tested clearly here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97655/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98246/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r229362181
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
         // "Extract PythonUDF From JoinCondition".
         Batch("Check Cartesian Products", Once,
           CheckCartesianProducts) :+
    -    Batch("RewriteSubquery", Once,
    +    Batch("Rewrite Subquery", Once,
    --- End diff --
    
    @gatorsmile Sure Sean.. Let me give it a try.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #98294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98294/testReport)** for PR 22778 at commit [`80bf621`](https://github.com/apache/spark/commit/80bf621f5fc4d2806d116b3965374fa9d0947311).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4649/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r229356678
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
         // "Extract PythonUDF From JoinCondition".
         Batch("Check Cartesian Products", Once,
           CheckCartesianProducts) :+
    -    Batch("RewriteSubquery", Once,
    +    Batch("Rewrite Subquery", Once,
    --- End diff --
    
    @gatorsmile I think @dilipbiswal's suggestion is the right way to go.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227208719
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
           Batch("Rewrite Subquery", FixedPoint(1),
             RewritePredicateSubquery,
             ColumnPruning,
    +        InferFiltersFromConstraints,
    +        PushDownPredicate,
             CollapseProject,
             RemoveRedundantProject) :: Nil
       }
     
       test("Column pruning after rewriting predicate subquery") {
    -    val relation = LocalRelation('a.int, 'b.int)
    -    val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    +    withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
    --- End diff --
    
    Yes, `spark.sql.constraintPropagation.enabled=false` to test `ColumnPruning`.
    `spark.sql.constraintPropagation.enabled=true` to test `ColumnPruning`, `InferFiltersFromConstraints` and `PushDownPredicate`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227261253
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,9 +171,11 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
         // "Extract PythonUDF From JoinCondition".
         Batch("Check Cartesian Products", Once,
           CheckCartesianProducts) :+
    -    Batch("RewriteSubquery", Once,
    +    Batch("Rewrite Subquery", Once,
           RewritePredicateSubquery,
           ColumnPruning,
    +      InferFiltersFromConstraints,
    +      PushDownPredicate,
    --- End diff --
    
    looks good, cc @gatorsmile @maryannxue 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Can you put the concrete example of the missing case you described in the PR description?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    We also need to add the `CombineFilters` based on https://github.com/apache/spark/pull/22879.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Also, to make sure no performance regression in the optimizer, can you check optimizer statistics in TPCDS by running `TPCDSQuerySuite`, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Let us hold this PR and try to fix https://github.com/apache/spark/pull/17520 instead.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #97655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97655/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r227214404
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RewriteSubquerySuite.scala ---
    @@ -33,23 +34,44 @@ class RewriteSubquerySuite extends PlanTest {
           Batch("Rewrite Subquery", FixedPoint(1),
             RewritePredicateSubquery,
             ColumnPruning,
    +        InferFiltersFromConstraints,
    +        PushDownPredicate,
             CollapseProject,
             RemoveRedundantProject) :: Nil
       }
     
       test("Column pruning after rewriting predicate subquery") {
    -    val relation = LocalRelation('a.int, 'b.int)
    -    val relInSubquery = LocalRelation('x.int, 'y.int, 'z.int)
    +    withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> "false") {
    --- End diff --
    
    Ah, I see. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r229178084
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
         // "Extract PythonUDF From JoinCondition".
         Batch("Check Cartesian Products", Once,
           CheckCartesianProducts) :+
    -    Batch("RewriteSubquery", Once,
    +    Batch("Rewrite Subquery", Once,
    --- End diff --
    
    I do not have a good answer for this PR. Ideally, we should run the whole batch `operatorOptimizationBatch`. However, running the rules could be very time consuming. I would suggest to add a new parameter for introducing the time bound limit for each batch.
    
    cc @maryannxue  WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97629/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4139/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97669/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98294/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    cc @gatorsmile @cloud-fan @maropu


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #98294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98294/testReport)** for PR 22778 at commit [`80bf621`](https://github.com/apache/spark/commit/80bf621f5fc4d2806d116b3965374fa9d0947311).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r229181821
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
         // "Extract PythonUDF From JoinCondition".
         Batch("Check Cartesian Products", Once,
           CheckCartesianProducts) :+
    -    Batch("RewriteSubquery", Once,
    +    Batch("Rewrite Subquery", Once,
    --- End diff --
    
    @gatorsmile Do you think its a good time to revisit Natt's PR to convert subquery expressions to Joins early in the optimization process ?  Perhaps then we can take advantage of all the subsequent rules firing after the subquery rewrite ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22778: [SPARK-25784][SQL] Infer filters from constraints...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22778#discussion_r229361802
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -171,10 +171,13 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
         // "Extract PythonUDF From JoinCondition".
         Batch("Check Cartesian Products", Once,
           CheckCartesianProducts) :+
    -    Batch("RewriteSubquery", Once,
    +    Batch("Rewrite Subquery", Once,
    --- End diff --
    
    Sure. That sounds also good to me. @dilipbiswal Could you take the PR https://github.com/apache/spark/pull/17520 over?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #97669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97669/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4174/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #97629 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97629/testReport)** for PR 22778 at commit [`c8d1b91`](https://github.com/apache/spark/commit/c8d1b91b93e7ad05ca0bd17984fad1c30062d504).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #97629 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97629/testReport)** for PR 22778 at commit [`c8d1b91`](https://github.com/apache/spark/commit/c8d1b91b93e7ad05ca0bd17984fad1c30062d504).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4617/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #98246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98246/testReport)** for PR 22778 at commit [`db519c3`](https://github.com/apache/spark/commit/db519c3f0241cb13880665eca934e6d6e34e6fd7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22778: [SPARK-25784][SQL] Infer filters from constraints after ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22778
  
    **[Test build #97669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97669/testReport)** for PR 22778 at commit [`6596327`](https://github.com/apache/spark/commit/6596327c345e3d7f22c7aafe916356778b0e9934).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org