You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Franco Bonazza (JIRA)" <ji...@apache.org> on 2019/01/24 14:23:00 UTC
[jira] [Commented] (SPARK-18484) case class datasets - ability to
specify decimal precision and scale
[ https://issues.apache.org/jira/browse/SPARK-18484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16751174#comment-16751174 ]
Franco Bonazza commented on SPARK-18484:
----------------------------------------
What if you have a DataFrame with higher precision e.g. 38, 17, this effectively breaks df.as[TestClass], it busts on truncation because I can't specify the schema of the resulting Dataset. Am I missing something? Doesn't seem like a non issue to me. The only work around I see is not using Datasets.
> case class datasets - ability to specify decimal precision and scale
> --------------------------------------------------------------------
>
> Key: SPARK-18484
> URL: https://issues.apache.org/jira/browse/SPARK-18484
> Project: Spark
> Issue Type: Improvement
> Affects Versions: 2.0.0, 2.0.1
> Reporter: Damian Momot
> Priority: Major
>
> Currently when using decimal type (BigDecimal in scala case class) there's no way to enforce precision and scale. This is quite critical when saving data - regarding space usage and compatibility with external systems (for example Hive table) because spark saves data as Decimal(38,18)
> {code}
> case class TestClass(id: String, money: BigDecimal)
> val testDs = spark.createDataset(Seq(
> TestClass("1", BigDecimal("22.50")),
> TestClass("2", BigDecimal("500.66"))
> ))
> testDs.printSchema()
> {code}
> {code}
> root
> |-- id: string (nullable = true)
> |-- money: decimal(38,18) (nullable = true)
> {code}
> Workaround is to convert dataset to dataframe before saving and manually cast to specific decimal scale/precision:
> {code}
> import org.apache.spark.sql.types.DecimalType
> val testDf = testDs.toDF()
> testDf
> .withColumn("money", testDf("money").cast(DecimalType(10,2)))
> .printSchema()
> {code}
> {code}
> root
> |-- id: string (nullable = true)
> |-- money: decimal(10,2) (nullable = true)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org