You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dilip Biswal (JIRA)" <ji...@apache.org> on 2019/03/18 16:10:00 UTC
[jira] [Comment Edited] (SPARK-27191) union of dataframes depends on order of the columns in 2.4.0

    [ https://issues.apache.org/jira/browse/SPARK-27191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16795150#comment-16795150 ] 

Dilip Biswal edited comment on SPARK-27191 at 3/18/19 4:09 PM:
---------------------------------------------------------------

Hello [~mrinal10449],

The Jira you have referred to [link-22335|https://issues.apache.org/jira/browse/SPARK-22335 ], actually hasn't resulted in a code change. As a fix, [~viirya] has improved the documentation of the union API by clarifying that union api resolves the columns by their positions and not by name. Here is the link to the [PR|https://github.com/apache/spark/pull/19570/files]. The recommended method for your use case is to use 'unionByName' instead.




was (Author: dkbiswal):
Hello [~mrinal10449],

The Jira you have referred to [link-22335|https://issues.apache.org/jira/browse/SPARK-22335 ], actually hasn't resulted in a code change. As a fix, [~viirya] has improved the documentation of the union API by clarifying that union api resolves the columns by their positions and not by name. Here is the link to the [PR|https://github.com/apache/spark/pull/19570/files]. The recommended method for your use case is to use 'unionByName'.



> union of dataframes depends on order of the columns in 2.4.0
> ------------------------------------------------------------
>
>                 Key: SPARK-27191
>                 URL: https://issues.apache.org/jira/browse/SPARK-27191
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Mrinal Kanti Sardar
>            Priority: Major
>
> Thought this issue was resolved in 2.3.0 according to https://issues.apache.org/jira/browse/SPARK-22335 but I still faced this in 2.4.0.
> {code:java}
> >>> df_1 = spark.createDataFrame([["1aa", "1bbbbbbb"]], ["col1", "col2"])
> >>> df_1.show()
> +----+--------+
> |col1| col2|
> +----+--------+
> | 1aa|1bbbbbbb|
> +----+--------+
> >>> df_2 = spark.createDataFrame([["2bbbbbbb", "2aa"]], ["col2", "col1"])
> >>> df_2.show()
> +--------+----+
> | col2|col1|
> +--------+----+
> |2bbbbbbb| 2aa|
> +--------+----+
> >>> df_u = df_1.union(df_2)
> >>> df_u.show()
> +--------+--------+
> | col1| col2|
> +--------+--------+
> | 1aa|1bbbbbbb|
> |2bbbbbbb| 2aa|
> +--------+--------+
> >>> spark.version
> '2.4.0'
> >>>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org