You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Navin Kumar (Jira)" <ji...@apache.org> on 2023/10/06 17:49:00 UTC
[jira] [Updated] (SPARK-45438) Decimal precision exceeds max precision error when using unary minus on min Decimal values on Scala 2.13 Spark

     [ https://issues.apache.org/jira/browse/SPARK-45438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Navin Kumar updated SPARK-45438:
--------------------------------
    Summary: Decimal precision exceeds max precision error when using unary minus on min Decimal values on Scala 2.13 Spark  (was: Decimal precision exceeds max precision error when using unary minus on min Decimal values)

> Decimal precision exceeds max precision error when using unary minus on min Decimal values on Scala 2.13 Spark
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-45438
>                 URL: https://issues.apache.org/jira/browse/SPARK-45438
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 3.3.2, 3.4.0, 3.4.1, 3.5.0
>            Reporter: Navin Kumar
>            Priority: Major
>              Labels: scala
>
> When submitting an application to Spark built with Scala 2.13, there are issues with Decimal overflow that show up when using unary minus (and also {{abs()}} which uses unary minus under the hood.
> Here is an example PySpark reproduce use case:
> {code}
> from decimal import Decimal
> from pyspark.sql import SparkSession
> from pyspark.sql.types import StructType,StructField, DecimalType
> spark = SparkSession.builder \
>       .master("local[*]") \
>       .appName("decimal_precision") \
>       .config("spark.rapids.sql.explain", "ALL") \
>       .config("spark.sql.ansi.enabled", "true") \
>       .config("spark.sql.legacy.allowNegativeScaleOfDecimal", 'true') \
>       .getOrCreate()  
> precision = 38
> scale = 0
> DECIMAL_MIN = Decimal('-' + ('9' * precision) + 'e' + str(-scale))
> data = [[DECIMAL_MIN]]
> schema = StructType([
>     StructField("a", DecimalType(precision, scale), True)])
> df = spark.createDataFrame(data=data, schema=schema)
> df.selectExpr("a", "-a").show()
> {code}
> This particular example will run successfully on Spark built with Scala 2.12, but throw a java.math.ArithmeticException on Spark built with Scala 2.13. 
> If you change the value of {{DECIMAL_MIN}} in the previous code to something just ahead of the original DECIMAL_MIN, you will not get an exception thrown, but instead you will get an incorrect answer (possibly due to overflow):
> {code}
> ...
> DECIMAL_MIN = Decimal('-8' + ('9' * (precision-1)) + 'e' + str(-scale))
> ...
> {code} 
> Output:
> {code}
> +--------------------+--------------------+
> |                   a|               (- a)|
> +--------------------+--------------------+
> |-8999999999999999...|90000000000000000...|
> +--------------------+--------------------+
> {code}
> It looks like the code in {{Decimal.scala}} uses {{scala.math.BigDecimal}}. See https://github.com/scala/bug/issues/11590 with updates on how Scala 2.13 handles BigDecimal. It looks like there is {{java.math.MathContext}} missing when performing these operations. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org