You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/24 19:22:32 UTC

[GitHub] [spark] MaxGekk opened a new pull request, #37649: [WIP][SQL] Modify the interval value of decimal in `changePrecision` on errors only

MaxGekk opened a new pull request, #37649:
URL: https://github.com/apache/spark/pull/37649

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   To don't confuse users by error messages. That improves user experience with Spark SQL.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes. The PR changes user-facing errors.
   
   ### How was this patch tested?
   By running the affected test suites:
   ```
   $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z cast.sql"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a diff in pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on code in PR #37649:
URL: https://github.com/apache/spark/pull/37649#discussion_r954423097


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala:
##########
@@ -394,48 +394,49 @@ final class Decimal extends Ordered[Decimal] with Serializable {
     if (precision == this.precision && scale == this.scale) {
       return true
     }
+    var lv = longVal
     DecimalType.checkNegativeScale(scale)
-    // First, update our longVal if we can, or transfer over to using a BigDecimal
+    // First, update our lv if we can, or transfer over to using a BigDecimal
     if (decimalVal.eq(null)) {
       if (scale < _scale) {
         // Easier case: we just need to divide our scale down
         val diff = _scale - scale
         val pow10diff = POW_10(diff)
         // % and / always round to 0
-        val droppedDigits = longVal % pow10diff
-        longVal /= pow10diff
+        val droppedDigits = lv % pow10diff
+        lv /= pow10diff
         roundMode match {
           case ROUND_FLOOR =>
             if (droppedDigits < 0) {
-              longVal += -1L
+              lv += -1L
             }
           case ROUND_CEILING =>
             if (droppedDigits > 0) {
-              longVal += 1L
+              lv += 1L
             }
           case ROUND_HALF_UP =>
             if (math.abs(droppedDigits) * 2 >= pow10diff) {
-              longVal += (if (droppedDigits < 0) -1L else 1L)
+              lv += (if (droppedDigits < 0) -1L else 1L)
             }
           case ROUND_HALF_EVEN =>
             val doubled = math.abs(droppedDigits) * 2
-            if (doubled > pow10diff || doubled == pow10diff && longVal % 2 != 0) {
-              longVal += (if (droppedDigits < 0) -1L else 1L)
+            if (doubled > pow10diff || doubled == pow10diff && lv % 2 != 0) {
+              lv += (if (droppedDigits < 0) -1L else 1L)
             }
           case _ =>
             throw QueryExecutionErrors.unsupportedRoundingMode(roundMode)
         }
       } else if (scale > _scale) {
-        // We might be able to multiply longVal by a power of 10 and not overflow, but if not,
+        // We might be able to multiply lv by a power of 10 and not overflow, but if not,
         // switch to using a BigDecimal
         val diff = scale - _scale
         val p = POW_10(math.max(MAX_LONG_DIGITS - diff, 0))
-        if (diff <= MAX_LONG_DIGITS && longVal > -p && longVal < p) {
-          // Multiplying longVal by POW_10(diff) will still keep it below MAX_LONG_DIGITS
-          longVal *= POW_10(diff)
+        if (diff <= MAX_LONG_DIGITS && lv > -p && lv < p) {
+          // Multiplying lv by POW_10(diff) will still keep it below MAX_LONG_DIGITS
+          lv *= POW_10(diff)
         } else {
           // Give up on using Longs; switch to BigDecimal, which we'll modify below
-          decimalVal = BigDecimal(longVal, _scale)
+          decimalVal = BigDecimal(lv, _scale)

Review Comment:
   shall we avoid updating `decimalVal` as well if we eventually return false?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #37649:
URL: https://github.com/apache/spark/pull/37649#issuecomment-1226865324

   Merging to master. Thank you, @srielau @gengliangwang @cloud-fan for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #37649:
URL: https://github.com/apache/spark/pull/37649#issuecomment-1226744505

   > This change seems to break the design.
   
   The bad design is to leave a Decimal value in inconsistent/wrong state between invokes of Decimal methods. Seems like the current assumption is `changePrecision()` can be invoked ONLY once (since we cannot predict either the method fails or not). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #37649:
URL: https://github.com/apache/spark/pull/37649#issuecomment-1226667234

   > This change seems to break the design.
   
   I don't think so. Having side effects for better performance is OK, but it doesn't mean we can leave the decimal value in a "crashed" state. `changePrecision` should be a no-op if it returns false.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors
URL: https://github.com/apache/spark/pull/37649


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #37649: [SPARK-40209][SQL] Don't change the interval value of Decimal in `changePrecision()` on errors

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on PR #37649:
URL: https://github.com/apache/spark/pull/37649#issuecomment-1226225115

   Spark's decimal has two methods by design:
   * changePrecision: update in place for better performance
   * toPrecision: make a copy and check if changing precision works
   
   This change seems to break the design. Also, as mentioned in https://github.com/apache/spark/pull/37620#discussion_r954096295, I think it is better to show the original value string representation in the error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org