You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hao Ren (JIRA)" <ji...@apache.org> on 2019/05/27 13:19:00 UTC

[jira] [Updated] (SPARK-27855) Union failed between 2 datasets of the same type converted from different dataframes

     [ https://issues.apache.org/jira/browse/SPARK-27855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hao Ren updated SPARK-27855:
----------------------------
    Description: 
2 Datasets of the same type converted from different dataframes can not union.

Here is the code to reproduce the problem. It seems `union` just checks the schema of the orignal dataframe, even if the two datasets have already been converted to the same type of dataset.
{code:java}
case class Entity(key: Int, a: Int, b: String)
val df1 = Seq((2,2,"2")).toDF("key", "a", "b").as[Entity]
val df2 = Seq((1,"1",1)).toDF("key", "b", "a").as[Entity]
df1.printSchema
df2.printSchema
df1 union df2
{code}
Result
{code:java}
defined class Entity
df1: org.apache.spark.sql.Dataset[Entity] = [key: int, a: int ... 1 more field]
df2: org.apache.spark.sql.Dataset[Entity] = [key: int, b: string ... 1 more field]
converted
root
|-- key: integer (nullable = false)
|-- a: integer (nullable = false)
|-- b: string (nullable = true)

root
|-- key: integer (nullable = false)
|-- b: string (nullable = true)
|-- a: integer (nullable = false)

org.apache.spark.sql.AnalysisException: Cannot up cast `a` from string to int as it may truncate
The type path of the target object is:
- field (class: "scala.Int", name: "a")
- root class: "Entity"{code}

  was:
2 Datasets of the same type converted from different dataframes can not union.

Here is the code to reproduce the problem. It seems `union` just checks the schema of the orignal dataframe, even if the two datasets have already been converted to the same type of dataset.
{code:java}
case class Entity(key: Int, a: Int, b: String)
val df1 = Seq((2,2,"2")).toDF("key", "a", "b").as[Entity]
val df2 = Seq((1,"1",1)).toDF("key", "b", "a").as[Entity]
df1.printSchema
df2.printSchema
df1 union df2
{code}
Result
{code:java}
defined class Entity df1: org.apache.spark.sql.Dataset[Entity] = [key: int, a: int ... 1 more field] df2: org.apache.spark.sql.Dataset[Entity] = [key: int, b: string ... 1 more field] converted root |-- key: integer (nullable = false) |-- a: integer (nullable = false) |-- b: string (nullable = true) root |-- key: integer (nullable = false) |-- b: string (nullable = true) |-- a: integer (nullable = false) org.apache.spark.sql.AnalysisException: Cannot up cast `a` from string to int as it may truncate The type path of the target object is: - field (class: "scala.Int", name: "a") - root class: "Entity" You can either add an expl
{code}


> Union failed between 2 datasets of the same type converted from different dataframes
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-27855
>                 URL: https://issues.apache.org/jira/browse/SPARK-27855
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.3
>            Reporter: Hao Ren
>            Priority: Major
>
> 2 Datasets of the same type converted from different dataframes can not union.
> Here is the code to reproduce the problem. It seems `union` just checks the schema of the orignal dataframe, even if the two datasets have already been converted to the same type of dataset.
> {code:java}
> case class Entity(key: Int, a: Int, b: String)
> val df1 = Seq((2,2,"2")).toDF("key", "a", "b").as[Entity]
> val df2 = Seq((1,"1",1)).toDF("key", "b", "a").as[Entity]
> df1.printSchema
> df2.printSchema
> df1 union df2
> {code}
> Result
> {code:java}
> defined class Entity
> df1: org.apache.spark.sql.Dataset[Entity] = [key: int, a: int ... 1 more field]
> df2: org.apache.spark.sql.Dataset[Entity] = [key: int, b: string ... 1 more field]
> converted
> root
> |-- key: integer (nullable = false)
> |-- a: integer (nullable = false)
> |-- b: string (nullable = true)
> root
> |-- key: integer (nullable = false)
> |-- b: string (nullable = true)
> |-- a: integer (nullable = false)
> org.apache.spark.sql.AnalysisException: Cannot up cast `a` from string to int as it may truncate
> The type path of the target object is:
> - field (class: "scala.Int", name: "a")
> - root class: "Entity"{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org