You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2020/09/11 14:39:00 UTC
[jira] [Updated] (SPARK-32706) Poor performance when casting
invalid decimal string to decimal type
[ https://issues.apache.org/jira/browse/SPARK-32706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuming Wang updated SPARK-32706:
--------------------------------
Attachment: part-00000.parquet
> Poor performance when casting invalid decimal string to decimal type
> --------------------------------------------------------------------
>
> Key: SPARK-32706
> URL: https://issues.apache.org/jira/browse/SPARK-32706
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Yuming Wang
> Priority: Minor
> Attachments: part-00000.parquet
>
>
> Benchmark result:
> {code:scala}
> import org.apache.spark.benchmark.Benchmark
> val N = 100000000L
> val path = "/tmp/spark/data"
> spark.range(N).selectExpr("concat('x', cast(id as string)) as str1", "cast(id * 10 as string) as str2").write.mode("overwrite").parquet(path)
> val benchmark = new Benchmark(s"Benchmark cast string to decimal", valuesPerIteration = N, minNumIters = 2)
> val df = spark.read.parquet(path)
> benchmark.addCase("valid decimal") { _ =>
> df.selectExpr("cast(str2 as decimal(18,0))").write.format("noop").mode("overwrite").save()
> }
> benchmark.addCase("invalid decimal") { _ =>
> df.selectExpr("cast(str1 as decimal(18,0))").write.format("noop").mode("overwrite").save()
> }
> benchmark.run()
> {code}
> {noformat}
> Java HotSpot(TM) 64-Bit Server VM 1.8.0_251-b08 on Mac OS X 10.15.6
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark cast string to decimal: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> ------------------------------------------------------------------------------------------------------------------------
> valid decimal 9571 9645 104 10.4 95.7 1.0X
> invalid decimal 81150 81198 67 1.2 811.5 0.1X
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org