You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yves Li (Jira)" <ji...@apache.org> on 2022/03/19 00:44:00 UTC

[jira] [Created] (SPARK-38603) Qualified star selection produces duplicated common columns after join then alias

Yves Li created SPARK-38603:
-------------------------------

             Summary: Qualified star selection produces duplicated common columns after join then alias
                 Key: SPARK-38603
                 URL: https://issues.apache.org/jira/browse/SPARK-38603
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.2.0
         Environment: OS: Ubuntu 18.04.5 LTS
Scala version: 2.12.15
            Reporter: Yves Li


When joining two DataFrames and then aliasing the result, selecting columns from the resulting Dataset by a qualified star produces duplicates of the joined columns.
{code:scala}
scala> val df1 = Seq((1, 10), (2, 20)).toDF("a", "x")
df1: org.apache.spark.sql.DataFrame = [a: int, x: int]

scala> val df2 = Seq((2, 200), (3, 300)).toDF("a", "y")
df2: org.apache.spark.sql.DataFrame = [a: int, y: int]

scala> val joined = df1.join(df2, "a").alias("joined")
joined: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [a: int, x: int ... 1 more field]

scala> joined.select("*").show()
+---+---+---+
|  a|  x|  y|
+---+---+---+
|  2| 20|200|
+---+---+---+

scala> joined.select("joined.*").show()
+---+---+---+---+
|  a|  a|  x|  y|
+---+---+---+---+
|  2|  2| 20|200|
+---+---+---+---+

scala> joined.select("*").select("joined.*").show()
+---+---+---+
|  a|  x|  y|
+---+---+---+
|  2| 20|200|
+---+---+---+ {code}
This appears to be introduced by SPARK-34527, leading to some surprising behaviour. Using an earlier version, such as Spark 3.0.2, produces the same output for all three {{{}show(){}}}s.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org