You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/14 13:07:39 UTC

[GitHub] [spark] cloud-fan edited a comment on issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with decimal with negative scale

cloud-fan edited a comment on issue #22450: [SPARK-25454][SQL] Avoid precision loss in division with decimal with negative scale
URL: https://github.com/apache/spark/pull/22450#issuecomment-472844445
 
 
   Sorry for the late reply, as I've struggled with it for a long time.
   
   AFAIK there are 2 ways to define a decimal type:
   
   1. The [Java way](https://stackoverflow.com/questions/35435691/bigdecimal-precision-and-scale). `precision` is the number of digits in the `unscaledValue`, and `scale` decides how to convert the `unscaledValue` to the actual decimal value, according to `unscaledValue * 10^-scale`.
   
   This means,
   `123.45 = 12345 * 10^-2`, so `precision` is 5, `scale` is 2.
   `0.00123 = 123 * 10^-5`, so `precision` is 3, `scale` is 5.
   `12300 = 123 * 10^2`, so `precision` is 3, `scale` is -2.
   
   2. The SQL way(at least some old version of SQL standard). `precision` is the number of digits in the decimal value, `scale` is the number of digits after dot in the decimal value.
   
   This means,
   `123.45` has 5 digits in total, 2 digits after dot, so `precision` is 5, `scale` is 2.
   `0.00123` has 5 digits in total(ignore the integral part), and 5 digits after dot, so `precision` is 5, `scale` is 5.
   `12300` has 5 digits in total, no digits after dot, so `precision` is 5, `scale` is 0.
   
   
   AFAIK many database follow the SQL way to define the decimal type, i.e. `0 <= scale <= precision`, although the java way is more flexible. If Spark does not want to follow the SQL way to define decimal type, I think we should follow the java way, instead of some middle states betweek SQL and Java.
   
   We should also clearly list the tradeoffs of different options.
   
   As an example, if want to keep supporting negative scale:
   1. we would want to support `scale > precision` as well, to be consistent with the Java way to define decimal type.
   2. need to fix some corner cases of precision loss(what this PR is trying to fix)
   3. bad compatibility with data sources. (`sql("select 1e10 as a").write.parquet("/tmp/tt")` would fail)
   4. may have unknown pitfalls, as it's not widely supported by other databases.
   5. fully backward compatible

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org