You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2021/11/03 03:22:00 UTC

[jira] [Resolved] (SPARK-37191) Allow merging DecimalTypes with different precision values

     [ https://issues.apache.org/jira/browse/SPARK-37191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-37191.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 34462
[https://github.com/apache/spark/pull/34462]

> Allow merging DecimalTypes with different precision values 
> -----------------------------------------------------------
>
>                 Key: SPARK-37191
>                 URL: https://issues.apache.org/jira/browse/SPARK-37191
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.3, 3.1.0, 3.1.1, 3.2.0
>            Reporter: Ivan
>            Assignee: Ivan
>            Priority: Major
>             Fix For: 3.3.0
>
>
> When merging DecimalTypes with different precision but the same scale, one would get the following error:
> {code:java}
> Failed to merge fields 'col' and 'col'. Failed to merge decimal types with incompatible precision 17 and 12	at org.apache.spark.sql.types.StructType$.$anonfun$merge$2(StructType.scala:652)
> 	at scala.Option.map(Option.scala:230)
> 	at org.apache.spark.sql.types.StructType$.$anonfun$merge$1(StructType.scala:644)
> 	at org.apache.spark.sql.types.StructType$.$anonfun$merge$1$adapted(StructType.scala:641)
> 	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> 	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> 	at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
> 	at org.apache.spark.sql.types.StructType$.merge(StructType.scala:641)
> 	at org.apache.spark.sql.types.StructType.merge(StructType.scala:550) {code}
>  
> We could allow merging DecimalType values with different precision if the scale is the same for both types since there should not be any data correctness issues as one of the types will be extended, for example, DECIMAL(12, 2) -> DECIMAL(17, 2); however, this is not the case for upcasting when the scale is different - this would depend on the actual values.
>  
> Repro code:
> {code:java}
> import org.apache.spark.sql.types._
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> schema1.merge(schema2) {code}
>  
> This also affects Parquet schema merge which is where this issue was discovered originally:
> {code:java}
> import java.math.BigDecimal
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val data1 = sc.parallelize(Row(new BigDecimal("1234567890000.11")) :: Nil, 1)
> val schema1 = StructType(StructField("col", DecimalType(17, 2)) :: Nil)
> val data2 = sc.parallelize(Row(new BigDecimal("123456789.11")) :: Nil, 1)
> val schema2 = StructType(StructField("col", DecimalType(12, 2)) :: Nil)
> spark.createDataFrame(data2, schema2).write.parquet("/tmp/decimal-test.parquet")
> spark.createDataFrame(data1, schema1).write.mode("append").parquet("/tmp/decimal-test.parquet")
> // Reading the DataFrame fails
> spark.read.option("mergeSchema", "true").parquet("/tmp/decimal-test.parquet").show()
> >>>
> Failed merging schema:
> root
>  |-- col: decimal(17,2) (nullable = true)
> Caused by: Failed to merge fields 'col' and 'col'. Failed to merge decimal types with incompatible precision 12 and 17
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org