You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2023/04/10 08:00:00 UTC

[jira] [Commented] (SPARK-40609) Casts types according to bucket info for Equality expression

    [ https://issues.apache.org/jira/browse/SPARK-40609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17710080#comment-17710080 ] 

Yuming Wang commented on SPARK-40609:
-------------------------------------


{code:scala}
import org.apache.spark.benchmark.Benchmark
val numRows = 1024 * 1024 * 40
spark.sql(s"CREATE TABLE t using parquet AS SELECT id as a, cast(id as decimal(18, 0)) as b FROM range(${numRows}L)")
val benchmark = new Benchmark("Benchmark equal with cast", numRows, minNumIters = 2)

benchmark.addCase("default") { _ =>
  spark.sql("SELECT * FROM t t1 join t t2 on  t1.a = t2.b").write.format("noop").mode("Overwrite").save()
}

benchmark.addCase("cast to bigint") { _ =>
  spark.sql("SELECT * FROM t t1 join t t2 on  cast(t1.a as bigint) = cast(t2.b as bigint)").write.format("noop").mode("Overwrite").save()
}
benchmark.addCase("cast to decimal") { _ =>
  spark.sql("SELECT * FROM t t1 join t t2 on  cast(t1.a as decimal(18, 0)) = cast(t2.b as decimal(18, 0))").write.format("noop").mode("Overwrite").save()
}
benchmark.run()
{code}



{noformat}
OpenJDK 64-Bit Server VM 1.8.0_362-b09 on Mac OS X 13.2.1
Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Benchmark equal with cast:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
default                                           34594          35381        1113          1.2         824.8       1.0X
cast to bigint                                    29056          29367         440          1.4         692.7       1.2X
cast to decimal                                   32528          33081         783          1.3         775.5       1.1X
{noformat}




> Casts types according to bucket info for Equality expression
> ------------------------------------------------------------
>
>                 Key: SPARK-40609
>                 URL: https://issues.apache.org/jira/browse/SPARK-40609
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.4.0
>            Reporter: Yuming Wang
>            Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org