You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2023/04/10 08:00:00 UTC
[jira] [Commented] (SPARK-40609) Casts types according to bucket info for Equality expression
[ https://issues.apache.org/jira/browse/SPARK-40609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710080#comment-17710080 ]
Yuming Wang commented on SPARK-40609:
-------------------------------------
{code:scala}
import org.apache.spark.benchmark.Benchmark
val numRows = 1024 * 1024 * 40
spark.sql(s"CREATE TABLE t using parquet AS SELECT id as a, cast(id as decimal(18, 0)) as b FROM range(${numRows}L)")
val benchmark = new Benchmark("Benchmark equal with cast", numRows, minNumIters = 2)
benchmark.addCase("default") { _ =>
spark.sql("SELECT * FROM t t1 join t t2 on t1.a = t2.b").write.format("noop").mode("Overwrite").save()
}
benchmark.addCase("cast to bigint") { _ =>
spark.sql("SELECT * FROM t t1 join t t2 on cast(t1.a as bigint) = cast(t2.b as bigint)").write.format("noop").mode("Overwrite").save()
}
benchmark.addCase("cast to decimal") { _ =>
spark.sql("SELECT * FROM t t1 join t t2 on cast(t1.a as decimal(18, 0)) = cast(t2.b as decimal(18, 0))").write.format("noop").mode("Overwrite").save()
}
benchmark.run()
{code}
{noformat}
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark equal with cast: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
default 34594 35381 1113 1.2 824.8 1.0X
cast to bigint 29056 29367 440 1.4 692.7 1.2X
cast to decimal 32528 33081 783 1.3 775.5 1.1X
{noformat}
> Casts types according to bucket info for Equality expression
> ------------------------------------------------------------
>
> Key: SPARK-40609
> URL: https://issues.apache.org/jira/browse/SPARK-40609
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.4.0
> Reporter: Yuming Wang
> Priority: Major
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org