You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2022/08/30 15:50:00 UTC

[jira] [Resolved] (SPARK-38603) Qualified star selection produces duplicated common columns after join then alias

     [ https://issues.apache.org/jira/browse/SPARK-38603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-38603.
---------------------------------
    Resolution: Duplicate

> Qualified star selection produces duplicated common columns after join then alias
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-38603
>                 URL: https://issues.apache.org/jira/browse/SPARK-38603
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>         Environment: OS: Ubuntu 18.04.5 LTS
> Scala version: 2.12.15
>            Reporter: Yves Li
>            Priority: Minor
>
> When joining two DataFrames and then aliasing the result, selecting columns from the resulting Dataset by a qualified star produces duplicates of the joined columns.
> {code:scala}
> scala> val df1 = Seq((1, 10), (2, 20)).toDF("a", "x")
> df1: org.apache.spark.sql.DataFrame = [a: int, x: int]
> scala> val df2 = Seq((2, 200), (3, 300)).toDF("a", "y")
> df2: org.apache.spark.sql.DataFrame = [a: int, y: int]
> scala> val joined = df1.join(df2, "a").alias("joined")
> joined: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [a: int, x: int ... 1 more field]
> scala> joined.select("*").show()
> +---+---+---+
> |  a|  x|  y|
> +---+---+---+
> |  2| 20|200|
> +---+---+---+
> scala> joined.select("joined.*").show()
> +---+---+---+---+
> |  a|  a|  x|  y|
> +---+---+---+---+
> |  2|  2| 20|200|
> +---+---+---+---+
> scala> joined.select("*").select("joined.*").show()
> +---+---+---+
> |  a|  x|  y|
> +---+---+---+
> |  2| 20|200|
> +---+---+---+ {code}
> This appears to be introduced by SPARK-34527, leading to some surprising behaviour. Using an earlier version, such as Spark 3.0.2, produces the same output for all three {{{}show(){}}}s.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org