You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "kazuyukitanimura (via GitHub)" <gi...@apache.org> on 2023/11/06 19:02:00 UTC

[PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

kazuyukitanimura opened a new pull request, #43678:
URL: https://github.com/apache/spark/pull/43678

   ### What changes were proposed in this pull request?
   This PR fixes inaccurate Decimal multiplication and division results.
   
   
   ### Why are the changes needed?
   Decimal multiplication and division results may be inaccurate due to rounding issues.
   #### Multiplication:
   ```
   scala> sql("select  -14120025096157587712113961295153.858047 * -0.4652").show(truncate=false)
   +----------------------------------------------------+                          
   |(-14120025096157587712113961295153.858047 * -0.4652)|
   +----------------------------------------------------+
   |6568635674732509803675414794505.574764              |
   +----------------------------------------------------+
   ```
   The correct answer is `6568635674732509803675414794505.574763`
   
   Please note that the last digit is `3` instead of `4` as
   
   ```
   scala> java.math.BigDecimal("-14120025096157587712113961295153.858047").multiply(java.math.BigDecimal("-0.4652"))
   val res21: java.math.BigDecimal = 6568635674732509803675414794505.5747634644
   ```
   Since the factional part `.574763` is followed by `4644`, it should not be rounded up.
   
   #### Division:
   ```
   scala> sql("select -0.172787979 / 533704665545018957788294905796.5").show(truncate=false)
   +-------------------------------------------------+
   |(-0.172787979 / 533704665545018957788294905796.5)|
   +-------------------------------------------------+
   |-3.237521E-31                                    |
   +-------------------------------------------------+
   ```
   The correct answer is `-3.237520E-31`
   
   Please note that the last digit is `0` instead of `1` as
   
   ```
   scala> java.math.BigDecimal("-0.172787979").divide(java.math.BigDecimal("533704665545018957788294905796.5"), 100, java.math.RoundingMode.DOWN)
   val res22: java.math.BigDecimal = -3.237520489418037889998826491401059986665344697406144511563561222578738E-31
   ```
   Since the factional part `.237520` is followed by `4894...`, it should not be rounded up.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, users will see correct Decimal multiplication and division results.
   Directly multiplying and dividing with `org.apache.spark.sql.types.Decimal()` (not via SQL) will return 39 digit at maximum instead of 38 at maximum and round down instead of round half-up
   
   
   ### How was this patch tested?
   Test added
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "kazuyukitanimura (via GitHub)" <gi...@apache.org>.
kazuyukitanimura commented on code in PR #43678:
URL: https://github.com/apache/spark/pull/43678#discussion_r1395846247


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala:
##########
@@ -225,6 +226,112 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
     }
   }
 
+  test("SPARK-45786: Decimal multiply, divide, remainder, quot") {

Review Comment:
   Thanks @LuciferYang 
   Yes, this test is assuming the default `spark.sql.ansi.enabled=false`. The default behavior does not throw the exception for overflows, but Ansi mode does. Since this is a random value test, we may have combinations that overflows. 
   ```
   Cause: org.apache.spark.SparkArithmeticException: [NUMERIC_VALUE_OUT_OF_RANGE] 431393072276642444045219979063553045.571 cannot be represented as Decimal(38, 4). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead. SQLSTATE: 22003
   ```
   Sorry that I wasn't aware that there is a GHA for `spark.sql.ansi.enabled=true`. I can modify the test to ignore those cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43678:
URL: https://github.com/apache/spark/pull/43678#discussion_r1395693417


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala:
##########
@@ -225,6 +226,112 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
     }
   }
 
+  test("SPARK-45786: Decimal multiply, divide, remainder, quot") {

Review Comment:
   This test will failed when `spark.sql.ansi.enabled`
   
   https://github.com/apache/spark/actions/runs/6885072758/job/18728675619
   
   
   <img width="1083" alt="image" src="https://github.com/apache/spark/assets/1475305/6ae259de-30a2-412d-857a-85c8f0452258">
   
   You can reproduce the issue locally by executing `SPARK_ANSI_SQL_MODE=true build/sbt clean "catalyst/testOnly org.apache.spark.sql.catalyst.expressions.ArithmeticExpressionSuite"`
   
   @kazuyukitanimura Can you take a look at this issue? 
   
   also cc @dongjoon-hyun 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun commented on PR #43678:
URL: https://github.com/apache/spark/pull/43678#issuecomment-1799255045

   Merged to master/3.5/3.4. Thank you, @kazuyukitanimura .
   
   Could you make a backporting PR to branch-3.3 too?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.
dongjoon-hyun closed pull request #43678: [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results
URL: https://github.com/apache/spark/pull/43678


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "kazuyukitanimura (via GitHub)" <gi...@apache.org>.
kazuyukitanimura commented on PR #43678:
URL: https://github.com/apache/spark/pull/43678#issuecomment-1799303353

   Thank you all
   
   > Could you make a backporting PR to branch-3.3 too?
   
   I will @dongjoon-hyun 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43678:
URL: https://github.com/apache/spark/pull/43678#issuecomment-1796094167

   test: https://github.com/kazuyukitanimura/spark/actions/runs/6775292999/job/18414265284


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43678:
URL: https://github.com/apache/spark/pull/43678#discussion_r1386189974


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala:
##########
@@ -225,6 +226,112 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
     }
   }
 
+  test("SPARK-45786: Decimal multiply, divide, remainder, quot") {
+    // Some known cases
+    checkEvaluation(
+      Multiply(
+        Literal(Decimal(BigDecimal("-14120025096157587712113961295153.858047"), 38, 6)),
+        Literal(Decimal(BigDecimal("-0.4652"), 4, 4))
+      ),
+      Decimal(BigDecimal("6568635674732509803675414794505.574763"))
+    )
+    checkEvaluation(
+      Multiply(
+        Literal(Decimal(BigDecimal("-240810500742726"), 15, 0)),
+        Literal(Decimal(BigDecimal("-5677.6988688550027099967697071"), 29, 25))
+      ),
+      Decimal(BigDecimal("1367249507675382200.164877854336665327"))
+    )
+    checkEvaluation(
+      Divide(
+        Literal(Decimal(BigDecimal("-0.172787979"), 9, 9)),
+        Literal(Decimal(BigDecimal("533704665545018957788294905796.5"), 31, 1))

Review Comment:
   I think the test is wrong, this decimal should use precision 32



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43678:
URL: https://github.com/apache/spark/pull/43678#discussion_r1386189974


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala:
##########
@@ -225,6 +226,112 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
     }
   }
 
+  test("SPARK-45786: Decimal multiply, divide, remainder, quot") {
+    // Some known cases
+    checkEvaluation(
+      Multiply(
+        Literal(Decimal(BigDecimal("-14120025096157587712113961295153.858047"), 38, 6)),
+        Literal(Decimal(BigDecimal("-0.4652"), 4, 4))
+      ),
+      Decimal(BigDecimal("6568635674732509803675414794505.574763"))
+    )
+    checkEvaluation(
+      Multiply(
+        Literal(Decimal(BigDecimal("-240810500742726"), 15, 0)),
+        Literal(Decimal(BigDecimal("-5677.6988688550027099967697071"), 29, 25))
+      ),
+      Decimal(BigDecimal("1367249507675382200.164877854336665327"))
+    )
+    checkEvaluation(
+      Divide(
+        Literal(Decimal(BigDecimal("-0.172787979"), 9, 9)),
+        Literal(Decimal(BigDecimal("533704665545018957788294905796.5"), 31, 1))

Review Comment:
   I think the test is wrong, this decimal should use precision 32



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "kazuyukitanimura (via GitHub)" <gi...@apache.org>.
kazuyukitanimura commented on code in PR #43678:
URL: https://github.com/apache/spark/pull/43678#discussion_r1396515078


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala:
##########
@@ -225,6 +226,112 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
     }
   }
 
+  test("SPARK-45786: Decimal multiply, divide, remainder, quot") {

Review Comment:
   Addressed #43853



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45786][SQL] Fix inaccurate Decimal multiplication and division results [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43678:
URL: https://github.com/apache/spark/pull/43678#discussion_r1395693417


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala:
##########
@@ -225,6 +226,112 @@ class ArithmeticExpressionSuite extends SparkFunSuite with ExpressionEvalHelper
     }
   }
 
+  test("SPARK-45786: Decimal multiply, divide, remainder, quot") {

Review Comment:
   This test will failed when `spark.sql.ansi.enabled`
   
   https://github.com/apache/spark/actions/runs/6885072758/job/18728675619
   
   
   <img width="1083" alt="image" src="https://github.com/apache/spark/assets/1475305/6ae259de-30a2-412d-857a-85c8f0452258">
   
   You can reproduce the issue locally by executing `SPARK_ANSI_SQL_MODE=true build/sbt clean "catalyst/testOnly org.apache.spark.sql.catalyst.expressions.ArithmeticExpressionSuite"`
   
   @kazuyukitanimura Can you take a look at this issue? 
   
   also cc @dongjoon-hyun Since this patch has been backported to branch-3.4, I'm not sure if this will affect the version release of Spark 3.4.2
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org