You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/10/20 01:46:00 UTC

[jira] [Commented] (SPARK-25784) Infer filters from constraints after rewriting predicate subquery

    [ https://issues.apache.org/jira/browse/SPARK-25784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16657646#comment-16657646 ] 

Apache Spark commented on SPARK-25784:
--------------------------------------

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/22778

> Infer filters from constraints after rewriting predicate subquery
> -----------------------------------------------------------------
>
>                 Key: SPARK-25784
>                 URL: https://issues.apache.org/jira/browse/SPARK-25784
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> Benchmark:
> {code:scala}
> withTempView("t1", "t2") {
>   withTempDir { dir =>
>     spark.range(3000000)
>       .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as c2", "id as c3")
>       .coalesce(1)
>       .orderBy("c2")
>       .write
>       .mode("overwrite")
>       .option("parquet.block.size", 10485760)
>       .parquet(dir.getCanonicalPath)
>     spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
>     spark.read.parquet(dir.getCanonicalPath).createTempView("t2")
>     Seq("c1", "c2", "c3").foreach { column =>
>       val benchmark = new Benchmark(s"join key $column", 10)
>       Seq(false, true).foreach { inferFilters =>
>         benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { _ =>
>           withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> inferFilters.toString) {
>             sql(s"select t1.* from t1 where t1.$column in (select $column from t2)").count()
>           }
>         }
>       }
>       benchmark.run()
>     }
>   }
> }
> {code}
> Benchmark result:
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> join key c1:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------
> Is infer filters false                        2005 / 2163          0.0   200481431.0       1.0X
> Is infer filters true                          190 /  207          0.0    18962935.7      10.6X
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> join key c2:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------
> Is infer filters false                        2368 / 2498          0.0   236803743.1       1.0X
> Is infer filters true                         1234 / 1268          0.0   123443912.3       1.9X
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
> Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
> join key c3:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------
> Is infer filters false                        2754 / 2907          0.0   275376009.7       1.0X
> Is infer filters true                         2237 / 2255          0.0   223739457.8       1.2X
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org