You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (JIRA)" <ji...@apache.org> on 2018/10/20 01:30:00 UTC

[jira] [Created] (SPARK-25784) Infer filters from constraints after rewriting predicate subquery

Yuming Wang created SPARK-25784:
-----------------------------------

             Summary: Infer filters from constraints after rewriting predicate subquery
                 Key: SPARK-25784
                 URL: https://issues.apache.org/jira/browse/SPARK-25784
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Yuming Wang


Benchmark:
{code:scala}
withTempView("t1", "t2") {
  withTempDir { dir =>
    spark.range(3000000)
      .selectExpr("cast(null as int) as c1", "if(id % 2 = 0, null, id) as c2", "id as c3")
      .coalesce(1)
      .orderBy("c2")
      .write
      .mode("overwrite")
      .option("parquet.block.size", 10485760)
      .parquet(dir.getCanonicalPath)

    spark.read.parquet(dir.getCanonicalPath).createTempView("t1")
    spark.read.parquet(dir.getCanonicalPath).createTempView("t2")

    Seq("c1", "c2", "c3").foreach { column =>
      val benchmark = new Benchmark(s"join key $column", 10)
      Seq(false, true).foreach { inferFilters =>
        benchmark.addCase(s"Is infer filters $inferFilters", numIters = 5) { _ =>
          withSQLConf(SQLConf.CONSTRAINT_PROPAGATION_ENABLED.key -> inferFilters.toString) {
            sql(s"select t1.* from t1 where t1.$column in (select $column from t2)").count()
          }
        }
      }
      benchmark.run()
    }
  }
}
{code}

Benchmark result:
{noformat}
Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c1:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Is infer filters false                        2005 / 2163          0.0   200481431.0       1.0X
Is infer filters true                          190 /  207          0.0    18962935.7      10.6X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c2:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Is infer filters false                        2368 / 2498          0.0   236803743.1       1.0X
Is infer filters true                         1234 / 1268          0.0   123443912.3       1.9X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_151-b12 on Mac OS X 10.12.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
join key c3:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
Is infer filters false                        2754 / 2907          0.0   275376009.7       1.0X
Is infer filters true                         2237 / 2255          0.0   223739457.8       1.2X
{noformat}





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org