You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2016/09/01 22:33:22 UTC
[jira] [Updated] (SPARK-16461) Support partition batch pruning with
`<=>` (EqualNullSafe) predicate in InMemoryTableScanExec
[ https://issues.apache.org/jira/browse/SPARK-16461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-16461:
-------------------------------
Assignee: Hyukjin Kwon
> Support partition batch pruning with `<=>` (EqualNullSafe) predicate in InMemoryTableScanExec
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-16461
> URL: https://issues.apache.org/jira/browse/SPARK-16461
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Hyukjin Kwon
> Assignee: Hyukjin Kwon
> Fix For: 2.1.0
>
>
> It seems `EqualNullSafe` filter was missed for batch pruneing partitions in cached tables.
> Supporting this improve the performance roughly ~75% (it will vary).
> Running the codes below:
> {code}
> test("Null-safe equal comparison") {
> val N = 20000000
> val df = spark.range(N).repartition(20)
> val benchmark = new Benchmark("Null-safe equal comparison", N)
> df.createOrReplaceTempView("t")
> spark.catalog.cacheTable("t")
> sql("select id from t where id <=> 1").collect()
> benchmark.addCase("Null-safe equal comparison", 10) { _ =>
> sql("select id from t where id <=> 1").collect()
> }
> benchmark.run()
> }
> {code}
> produces the results below:
> Before:
> {code}
> Running benchmark: Null-safe equal comparison
> Running case: Null-safe equal comparison
> Stopped after 10 iterations, 2098 ms
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
> Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
> Null-safe equal comparison: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Null-safe equal comparison 204 / 210 98.1 10.2 1.0X
> {code}
> After
> {code}
> Running benchmark: Null-safe equal comparison
> Running case: Null-safe equal comparison
> Stopped after 10 iterations, 478 ms
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 on Mac OS X 10.11.5
> Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
> Null-safe equal comparison: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------
> Null-safe equal comparison 42 / 48 474.1 2.1 1.0X
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org