You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marco Gaido (Jira)" <ji...@apache.org> on 2019/09/19 07:26:00 UTC
[jira] [Commented] (SPARK-29123) DecimalType multiplication
precision loss
[ https://issues.apache.org/jira/browse/SPARK-29123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933128#comment-16933128 ]
Marco Gaido commented on SPARK-29123:
-------------------------------------
You can set {{spark.sql.decimalOperations.allowPrecisionLoss}} if you do not want to risk truncations in your operations. Otherwise, tuning properly the precision and scale of your input schema helps too.
> DecimalType multiplication precision loss
> ------------------------------------------
>
> Key: SPARK-29123
> URL: https://issues.apache.org/jira/browse/SPARK-29123
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.4.3
> Reporter: Benny Lu
> Priority: Major
>
> When doing multiplication with PySpark, it seems PySpark is losing precision.
> For example, when multiplying two decimals with precision 38,10, it returns 38,6 instead of 38,10. It also truncates result to three decimals which is incorrect result.
> {code:java}
> from decimal import Decimal
> from pyspark.sql.types import DecimalType, StructType, StructField
> schema = StructType([StructField("amount", DecimalType(38,10)), StructField("fx", DecimalType(38,10))])
> df = spark.createDataFrame([(Decimal(233.00), Decimal(1.1403218880))], schema=schema)
> df.printSchema()
> df = df.withColumn("amount_usd", df.amount * df.fx)
> df.printSchema()
> df.show()
> {code}
> Result
> {code:java}
> >>> df.printSchema()
> root
> |-- amount: decimal(38,10) (nullable = true)
> |-- fx: decimal(38,10) (nullable = true)
> |-- amount_usd: decimal(38,6) (nullable = true)
> >>> df = df.withColumn("amount_usd", df.amount * df.fx)
> >>> df.printSchema()
> root
> |-- amount: decimal(38,10) (nullable = true)
> |-- fx: decimal(38,10) (nullable = true)
> |-- amount_usd: decimal(38,6) (nullable = true)
> >>> df.show()
> +--------------+------------+----------+
> | amount| fx|amount_usd|
> +--------------+------------+----------+
> |233.0000000000|1.1403218880|265.695000|
> +--------------+------------+----------+
> {code}
> When rounding to two decimals, it returns 265.70 but the correct result should be 265.69499 and when rounded to two decimals, it should be 265.69.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org