You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2018/10/20 01:30:00 UTC
[jira] [Created] (SPARK-25784) Infer filters from constraints after
rewriting predicate subquery
Yuming Wang created SPARK-25784:
-----------------------------------
Summary: Infer filters from constraints after rewriting predicate subquery
Key: SPARK-25784
URL: https://issues.apache.org/jira/browse/SPARK-25784
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang
Benchmark:
{code:scala}
withTempView("t1", "t2") {
withTempDir { dir =>
spark.range(3000000)
.selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as c2", "id as c3")
.coalesce(1)
.orderBy("c2")
.write
.mode("overwrite")
.option("parquet.block.size", 10485760)
.parquet(dir.getCanonicalPath)
spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
spark.read.parquet(dir.getCanonicalPath).createTempView("t2")
Seq("c1", "c2", "c3").foreach { column =>
val benchmark = new Benchmark(s"join key $column", 10)
Seq(false, true).foreach { inferFilters =>
benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { _ =>
withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> inferFilters.toString) {
sql(s"select t1.* from t1 where t1.$column in (select $column from t2)").count()
}
}
}
benchmark.run()
}
}
}
{code}
Benchmark result:
{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c1: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Is infer filters false 2005 / 2163 0.0 200481431.0 1.0X
Is infer filters true 190 / 207 0.0 18962935.7 10.6X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c2: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Is infer filters false 2368 / 2498 0.0 236803743.1 1.0X
Is infer filters true 1234 / 1268 0.0 123443912.3 1.9X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c3: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Is infer filters false 2754 / 2907 0.0 275376009.7 1.0X
Is infer filters true 2237 / 2255 0.0 223739457.8 1.2X
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org