You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/05/20 04:32:51 UTC

[GitHub] [spark] manuzhang opened a new pull request, #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

manuzhang opened a new pull request, #36615:
URL: https://github.com/apache/spark/pull/36615

   ### What changes were proposed in this pull request?
   When analyzing, apply WidenSetOperationTypes after other rules.
   
   
   ### Why are the changes needed?
   The following SQL returns 1.00 while 1.00000000000000000000 is expected since union should pick the wider precision, the `Decimal(38,20)` from `v / v` .
   
   ```
   CREATE OR REPLACE TEMPORARY VIEW t3 AS VALUES (decimal(1)) tbl(v);
   CREATE OR REPLACE TEMPORARY VIEW t4 AS SELECT CAST(v AS DECIMAL(18, 2)) AS v FROM t3;
   
   SELECT CAST(1 AS DECIMAL(28, 2))
   UNION ALL
   SELECT v / v FROM t4;
   ```
   
   Checking the analyzed logical plan of the above SQL, `Project [cast((v / v)#236 as decimal(28,2)) AS (v / v)#237]`  is added by `WidenSetOperationTypes` before `DecimalPrecision` promoting precision for the divide. The result of `v / v` is cast to the narrower `decimal(28,2)`.
   
   ```
   == Analyzed Logical Plan ==
   CAST(1 AS DECIMAL(28,2)): decimal(28,2)
   Union false, false
   :- Project [CAST(1 AS DECIMAL(28,2))#235]
   :  +- Project [cast(1 as decimal(28,2)) AS CAST(1 AS DECIMAL(28,2))#235]
   :     +- OneRowRelation
   +- Project [cast((v / v)#236 as decimal(28,2)) AS (v / v)#237]
      +- Project [CheckOverflow((promote_precision(cast(v#228 as decimal(18,2))) / promote_precision(cast(v#228 as decimal(18,2)))), DecimalType(38,20), false) AS (v / v)#236]
         +- SubqueryAlias t4
            +- Project [cast(v#226 as decimal(18,2)) AS v#228]
               +- SubqueryAlias t3
                  +- SubqueryAlias tbl
                     +- LocalRelation [v#226]
   ```
   
   Hence, I propose to apply `WidenSetOperationTypes` after `DecimalPrecision`.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Add UT.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manuzhang closed pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

Posted by GitBox <gi...@apache.org>.
manuzhang closed pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss
URL: https://github.com/apache/spark/pull/36615


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manuzhang commented on pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

Posted by GitBox <gi...@apache.org>.
manuzhang commented on PR #36615:
URL: https://github.com/apache/spark/pull/36615#issuecomment-1142042879

   Superseded by https://github.com/apache/spark/pull/36698


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] manuzhang commented on pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

Posted by GitBox <gi...@apache.org>.
manuzhang commented on PR #36615:
URL: https://github.com/apache/spark/pull/36615#issuecomment-1132451324

   cc @gengliangwang @cloud-fan @turboFei 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #36615: [SPARK-39238][SQL] Apply WidenSetOperationTypes at last to fix decimal precision loss

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on PR #36615:
URL: https://github.com/apache/spark/pull/36615#issuecomment-1132481952

   Good catch!
   
   This is a long-standing issue. The type coercion for decimal types is really messy as it's not bound to `Expression.resolved`. Changing the rule order does fix this simple query, but I'm afraid it's still fragile as rule order is quite unreliable.
   
   I'd like to have a more aggresive refactor: let's don't require operands of these math operations to be the same decimal type, and we can define the return decimal type for each math operation individually. Then the only thing need to be done in the type coercion is to cast non-decimal operands to decimal types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org