You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/21 09:51:14 UTC

[GitHub] [spark] ulysses-you opened a new pull request, #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

ulysses-you opened a new pull request, #38739:
URL: https://github.com/apache/spark/pull/38739

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Make BinaryArithmetic result decimal type compatible with negative scale.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   BinaryArithmetic adjust presicion and scale does not respect negative scale. Some operators may return a decimal type which presicion less than scale. 
   
   This pr fixs three things:
   
   1. work with divide, this is a long time bug. before the error msg:
     ```
     -- 3.3
     Caused by: java.lang.AssertionError: assertion failed
       at scala.Predef$.assert(Predef.scala:208)
       at org.apache.spark.sql.types.DecimalType$.adjustPrecisionScale(DecimalType.scala:183)
       at org.apache.spark.sql.catalyst.analysis.DecimalPrecision$$anonfun$decimalAndDecimal$1.applyOrElse(DecimalPrecision.scala:145)
       at org.apache.spark.sql.catalyst.analysis.DecimalPrecision$$anonfun$decimalAndDecimal$1.applyOrElse(DecimalPrecision.scala:94)
       at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
       at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule.$anonfun$transform$3(TypeCoercion.scala:178)
       at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
       at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
       at scala.collection.immutable.List.foldLeft(List.scala:91)
       at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule.$anonfun$transform$2(TypeCoercion.scala:177)
     
     -- 3.4
     Caused by: java.lang.AssertionError: assertion failed
       at scala.Predef$.assert(Predef.scala:208)
       at org.apache.spark.sql.types.DecimalType$.adjustPrecisionScale(DecimalType.scala:184)
       at org.apache.spark.sql.catalyst.expressions.Divide.resultDecimalType(arithmetic.scala:802)
       at org.apache.spark.sql.catalyst.expressions.BinaryArithmetic.dataType(arithmetic.scala:238)
       at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$Division$$anonfun$3.applyOrElse(TypeCoercion.scala:515)
       at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$Division$$anonfun$3.applyOrElse(TypeCoercion.scala:509)
       at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule.$anonfun$transform$3(TypeCoercion.scala:190)
       at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
       at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
       at scala.collection.immutable.List.foldLeft(List.scala:91)
       at org.apache.spark.sql.catalyst.analysis.TypeCoercionBase$CombinedTypeCoercionRule.$anonfun$transform$2(TypeCoercion.scala:189)
     ```
   
   2. fix IntegralDivide can not work with 3.4 :
     ```
     org.apache.spark.sql.AnalysisException: Decimal scale (0) cannot be greater than precision (-4).
       at org.apache.spark.sql.errors.QueryCompilationErrors$.decimalCannotGreaterThanPrecisionError(QueryCompilationErrors.scala:2237)
       at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:49)
       at org.apache.spark.sql.types.DecimalType$.bounded(DecimalType.scala:164)
       at org.apache.spark.sql.catalyst.expressions.IntegralDivide.resultDecimalType(arithmetic.scala:868)
     ```
   
   3. correct the result for some operators which do not fail. e.g. remainder if right side precision bigger than 38
   
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   yes, bug fix when enable `spark.sql.legacy.allowNegativeScaleOfDecimal`
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   add test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1029960652


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   In this pr, I changed all BinaryArithmetic with negative scale. These test changes you see are actually the legacy issues since we support negative scale, not related https://github.com/apache/spark/pull/36698. I really can not understand why `decimal(38, -9)` can appear in Spark, it means overflow.
   The regression of  `IntegralDivide` is caused by https://github.com/apache/spark/pull/36698. We can move code to `IntegralDivide` if we want to narrow the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1027990670


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest
       }.isEmpty)
     }
   }
+
+  test("SPARK-41207: Fix BinaryArithmetic with negative scale") {
+    withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "true") {

Review Comment:
   I don't think we need to add new features for a legacy config, unless this is a regression. Can you clarify which commit caused the regression?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale
URL: https://github.com/apache/spark/pull/38739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1028178729


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest
       }.isEmpty)
     }
   }
+
+  test("SPARK-41207: Fix BinaryArithmetic with negative scale") {
+    withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "true") {

Review Comment:
   There are external reasons to not support negative scale decimal values. ANSI SQL disallows it, which is why SPARK-30252 turned it off by default. Parquet does not support it, which by the way that appears to be a bug in `ParquetTable` where it says that they are supported. I'll file an issue for it. 
   
   To me there are two choices we either need to support it fully and start to work through all of the issues and corner cases to make them work all the time, or we need to deprecate them and remove them.  It has been three years I think we can move from legacy to deprecated. Having a "solution" with land mines hidden in it and only a config to protect you is not a good solution at all. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1031054191


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   Not sure what you found. I think the reason why Divide and IntegralDivide fail is simple. SQL strandard does not allow negative scale, but we use its definition formula to calculate the result precision and scale. Then the result precision can be negative which is unexpected. So I think other binary arithmetic also should not follow if scale is negative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1033752088


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   I am fine with this going in just for IntegralDivide, and assuming that others are okay with it for regular Divide too. I think the others should be left as is, at least without more discussion from others. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1034221066


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   @cloud-fan what do you think ? only fix the regression and leave others as is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #38739:
URL: https://github.com/apache/spark/pull/38739#issuecomment-1463026755

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1031557392


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   What you are saying is that you want to remove negative scale decimal values by turning them into 0 scale decimal values because the SQL standard does not allow for negative scale decimal, but only for binary math expressions.  Why not everywhere? Why should abs still return a negative scale decimal? Why should cast allow us to cast to a negative scale decimal? How is that better than removing support for negative scale decimal?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1030661428


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   Sorry about the long explanation. I am not sure how to make it any shorter. 
   
   By definition a decimal multiply will add the scales. https://docs.oracle.com/javase/8/docs/api/java/math/BigDecimal.html there is no special case for negative scale vs positive scale. It adds them.
   
   ```
   scala> val lhs = BigDecimal("123E10")
   lhs: scala.math.BigDecimal = 1.23E+12
   
   scala> lhs.scale
   res0: Int = -10
   
   scala> val rhs = BigDecimal("9E1")
   rhs: scala.math.BigDecimal = 9E+1
   
   scala> rhs.scale
   res1: Int = -1
   
   scala> val ret = rhs * lhs
   ret: scala.math.BigDecimal = 1.107E+14
   
   scala> ret.scale
   res2: Int = -11
   ```
   
   By definition the resulting precision of a multiply can at most be `lhs.precision + rhs.precision + 1`. Again there is no call out for negative scale or positive scale. This is in the SQL standard and also how decimal math works. The problem is that you are resetting the values to have a scale of 0, and effectively changing the precision of the result. This does not follow the standard rules for decimal operations. You are effectively increasing the precision of the value. You get an equivalent answer, but the cost of storing the result is now much greater. So much so that many results will not fit and result in an overflow when they could have fit if the result had a negative scale.
   
   I am -1 on this change as is.  It is wrong to modify multiply in this way.  
   
   Divide is technically wrong in a number of ways with negative scale decimal values even before 3.4.0. The scale of a divide is LHS.scale - RHS.scale. This includes negative scale values. There is technically a great deal of scale loss when doing a decimal divide. Spark follows Hive and most other SQL implementations by having at least a scale of 6 in the output of the divide, and some complicated math to calculate the output precision that goes with it. But that does not deal with negative values cleanly. It would be really nice to know what Hive does in these cases, or what MsSQL does, or really anything that supports negative scale. Do they all return errors? What type do they return if it is not an error? It appears that the SQL spec itself has the bug in it, and I am not inclined to "fix" the spec without some understanding of what others are doing too.
   
   The reason `IntegralDivide` is failing now, where it didn't before, is because the output result was never checked for overflow before being converted into a Long.  `IntegralDivide` returns a Long. The overflow check in 3.4.0 that happens after the divide is totally skipped in previous versions because the output is a long.  Look at the rule in DecimalPrecision
   
   https://github.com/apache/spark/blob/fbbcf9434ac070dd4ced4fb9efe32899c6db12a9/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L192-L202
   
   If the output type is not a decimal do nothing.  If you want to have `IntegralDivide` act the same as before you need to just remove the overflow check in it. I am not sure if that is a good thing or not.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1027952685


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##########
@@ -234,7 +234,17 @@ abstract class BinaryArithmetic extends BinaryOperator
   }
 
   override def dataType: DataType = (left.dataType, right.dataType) match {
-    case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
+    case (DecimalType.Fixed(_p1, _s1), DecimalType.Fixed(_p2, _s2)) =>
+      // compatible with negative scale
+      val (p1, s1, p2, s2) = if (_s1 < 0 && _s2 < 0) {

Review Comment:
   good question, changed the name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1027995195


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest
       }.isEmpty)
     }
   }
+
+  test("SPARK-41207: Fix BinaryArithmetic with negative scale") {
+    withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "true") {

Review Comment:
   I don't think we need to add new features for a legacy config, unless this is a regression. Can you clarify which commit caused the regression?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1027903146


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala:
##########
@@ -234,7 +234,17 @@ abstract class BinaryArithmetic extends BinaryOperator
   }
 
   override def dataType: DataType = (left.dataType, right.dataType) match {
-    case (DecimalType.Fixed(p1, s1), DecimalType.Fixed(p2, s2)) =>
+    case (DecimalType.Fixed(_p1, _s1), DecimalType.Fixed(_p2, _s2)) =>
+      // compatible with negative scale
+      val (p1, s1, p2, s2) = if (_s1 < 0 && _s2 < 0) {

Review Comment:
   nit: are there any other methods that use ยท`_xxx` for local vars name?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on PR #38739:
URL: https://github.com/apache/spark/pull/38739#issuecomment-1321983710

   cc @revans2 @gengliangwang @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1029400640


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest
       }.isEmpty)
     }
   }
+
+  test("SPARK-41207: Fix BinaryArithmetic with negative scale") {
+    withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "true") {

Review Comment:
   To be clear we don't have it on in production. We just noticed it when trying to match the functionality changes after #36698 in the RAPIDs Accelerator for Apache Spark and filed SPARK-41207.  Most of the changes were small and logical, like when the output precision would be larger than 38 Spark will now round after doing an add instead of adjusting scale and rounding LHS and RHS before doing the add. This is one place that is arguably a regression and we wanted to be sure it was documented.
   
   Putting on my Spark contributor hat now, negative scale support is off by default as a legacy feature, but not deprecated. I am of the opinion that we should not support something half way. If add only worked for positive numbers we would fix it right away. But from what I have seen negative scale decimal is not widely used and "Officially supporting negative scale is really a hard problem".  As such I would like to see us deprecate support for it, and not fix bugs that come up.
   
   If on the other hand if there are enough contributors that want or need support for negative scale decimal, then they should come up with a plan to get it to a good place.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1031953045


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   I think this behavior follows what `spark.sql.legacy.allowNegativeScaleOfDecimal.enabled` said:
   https://github.com/apache/spark/blob/a205e97ad9ae7894b2ec27e5253da21c4500fc8c/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L3308-L3313
   
   As I mentioned, I'm ok to only fix the regression with `IntegralDivide` by changing negative scale to 0, and leave others. The cost is huge to change the behavior of all expression which support decimal type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1028718134


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest
       }.isEmpty)
     }
   }
+
+  test("SPARK-41207: Fix BinaryArithmetic with negative scale") {
+    withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "true") {

Review Comment:
   @cloud-fan, so far it seems the only regression is `IntegralDivide` which is affected by https://github.com/apache/spark/pull/36698. The other issues live long time. I think the root reason for these issues are same which is fixed by this pr. I'm not sure it is a kind of feature, but some correction for legacy issues.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1028793298


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala:
##########
@@ -3532,6 +3532,49 @@ class DataFrameSuite extends QueryTest
       }.isEmpty)
     }
   }
+
+  test("SPARK-41207: Fix BinaryArithmetic with negative scale") {
+    withSQLConf(SQLConf.LEGACY_ALLOW_NEGATIVE_SCALE_OF_DECIMAL_ENABLED.key -> "true") {

Review Comment:
   I'm not sure this fixes all issues. Officially supporting negative scale is really a hard problem. Do you really turn on this config in production?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] revans2 commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
revans2 commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1029404170


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   If the only regression is for [`IntegralDivide`](https://github.com/apache/spark/pull/38739#discussion_r1028718134) then why are there changes to the output type of `Multiply`? Even more so when this test was not modified as a part of #36698? It feels like you are making much wider changes than just fixing `IntegralDivide` for negative scale decimal and I really would like to understand why that is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

Posted by GitBox <gi...@apache.org>.
ulysses-you commented on code in PR #38739:
URL: https://github.com/apache/spark/pull/38739#discussion_r1031054191


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala:
##########
@@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends AnalysisTest with BeforeAndAfter {
       val a = AttributeReference("a", DecimalType(3, -10))()
       val b = AttributeReference("b", DecimalType(1, -1))()
       val c = AttributeReference("c", DecimalType(35, 1))()
-      checkType(Multiply(a, b), DecimalType(5, -11))
-      checkType(Multiply(a, c), DecimalType(38, -9))
-      checkType(Multiply(b, c), DecimalType(37, 0))
+      checkType(Multiply(a, b), DecimalType(16, 0))

Review Comment:
   Not sure what you found. I think the reason why Divide and IntegralDivide fail is simple. SQL strandard does not allow negative scale, but we use its definition formula to calculate the result precision and scale. Then the result precision can be nagetive which is unexpected. So I think other binary arithmetic also should not follow if scale is negative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org