You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Vinod KC (Jira)" <ji...@apache.org> on 2022/04/27 16:27:00 UTC
[jira] [Commented] (SPARK-25177) When dataframe decimal type column having scale higher than 6, 0 values are shown in scientific notation

    [ https://issues.apache.org/jira/browse/SPARK-25177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17528877#comment-17528877 ] 

Vinod KC commented on SPARK-25177:
----------------------------------

In case, if anyone looking for a workaround to convert  0 in scientific notation to plaintext, this code snippet may help.

 
{code:java}
import org.apache.spark.sql.types.Decimal

val handleBigDecZeroUDF = udf((decimalVal:Decimal) => {
if (decimalVal.scale > 6) {
    decimalVal.toBigDecimal.bigDecimal.toPlainString()
  } else {
    decimalVal.toString()
  }
})   
 
spark.sql("create table testBigDec (a decimal(10,7), b decimal(10,6), c decimal(10,8))")
spark.sql("insert into testBigDec values(0, 0,0)")
spark.sql("insert into testBigDec values(1, 1, 1)")
val df = spark.table("testBigDec")
df.show(false) // this will show scientific notation

// use custom UDF `handleBigDecZeroUDF` to convert zero into plainText notation 

df.select(handleBigDecZeroUDF(col("a")).as("a"),md5(handleBigDecZeroUDF(col("a"))).as("a-md5"),col("b"),handleBigDecZeroUDF(col("c")).as("c")).show(false) {code}

 

> When dataframe decimal type column having scale higher than 6, 0 values are shown in scientific notation
> --------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25177
>                 URL: https://issues.apache.org/jira/browse/SPARK-25177
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Vinod KC
>            Priority: Minor
>              Labels: bulk-closed
>
> If scale of decimal type is > 6 , 0 value will be shown in scientific notation and hence, when the dataframe output is saved to external database, it fails due to scientific notation on "0" values.
> Eg: In Spark
>  --------------
>  spark.sql("create table test (a decimal(10,7), b decimal(10,6), c decimal(10,8))")
>  spark.sql("insert into test values(0, 0,0)")
>  spark.sql("insert into test values(1, 1, 1)")
>  spark.table("test").show()
> |         a     |           b |               c |
> |       0E-7 |0.000000|         0E-8 |//If scale > 6, zero is displayed in scientific notation|
> |1.0000000|1.000000|1.00000000|
>  
>  Eg: In Postgress
>  --------------
>  CREATE TABLE Testdec (a DECIMAL(10,7), b DECIMAL(10,6), c DECIMAL(10,8));
>  INSERT INTO Testdec VALUES (0,0,0);
>  INSERT INTO Testdec VALUES (1,1,1);
>  select * from Testdec;
>  Result:
>            a |           b |        c
>  -----------++---------------------------------------
>  0.0000000 | 0.000000 | 0.00000000
>  1.0000000 | 1.000000 | 1.00000000
> We can make spark SQL result consistent with other Databases like Postgresql
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org