You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Munesh Bandaru (JIRA)" <ji...@apache.org> on 2017/09/01 21:33:00 UTC

[jira] [Commented] (SPARK-15918) unionAll returns wrong result when two dataframes has schema in different order

    [ https://issues.apache.org/jira/browse/SPARK-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151185#comment-16151185 ] 

Munesh Bandaru commented on SPARK-15918:
----------------------------------------

As the ticket was closed as 'Not a Problem', a workaround is to use the 'select' to change the order of the columns of one of the dataframe.
But if we have a large number of columns, it doesn't look good to specify all the columns.
So we can use the columns of one dataframe to arrange the other dataframe in the order as below.
{{comb_df = df1.unionAll(df2.select(df1.columns))}}

> unionAll returns wrong result when two dataframes has schema in different order
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-15918
>                 URL: https://issues.apache.org/jira/browse/SPARK-15918
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>         Environment: CentOS
>            Reporter: Prabhu Joseph
>
> On applying unionAll operation between A and B dataframes, they both has same schema but in different order and hence the result has column value mapping changed.
> Repro:
> {code}
> A.show()
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> |tag|year_day|tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value|
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> B.show()
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |dtype|                tag|      time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |    F|C_FNHXUT701Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUDP713.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F| C_FNHXUT718.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUT703Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUR716A.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUT803Z.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F| C_FNHXUT728.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F| C_FNHXUR806.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> A = A.unionAll(B)
> A.show()
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> |tag|           year_day|   tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|    value|
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> |  F|C_FNHXUT701Z.CNSTLO|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F|C_FNHXUDP713.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F| C_FNHXUT718.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F|C_FNHXUT703Z.CNSTLO|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F|C_FNHXUR716A.CNSTLO|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F|C_FNHXUT803Z.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F| C_FNHXUT728.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> |  F| C_FNHXUR806.CNSTHI|1443790800|    13|     2|    0|  10|      0|   275|   2015| 1.2345|2015275.0|
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> {code}
> On changing the schema of A according to B and doing unionAll works fine
> {code}
> C = A.select("dtype","tag","time","tm_hour","tm_mday","tm_min",”tm_mon”,"tm_sec","tm_yday","tm_year","value","year_day")
> A = C.unionAll(B)
> A.show()
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |dtype|                tag|      time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |    F|C_FNHXUT701Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUDP713.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F| C_FNHXUT718.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUT703Z.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUR716A.CNSTLO|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F|C_FNHXUT803Z.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F| C_FNHXUT728.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> |    F| C_FNHXUR806.CNSTHI|1443790800|     13|      2|     0|    10|     0|    275|   2015|1.2345| 2015275|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org