You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marco Gaido (JIRA)" <ji...@apache.org> on 2018/02/16 12:55:00 UTC
[jira] [Commented] (SPARK-23439) Ambiguous reference when selecting
column inside StructType with same name that outer colum
[ https://issues.apache.org/jira/browse/SPARK-23439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366945#comment-16366945 ]
Marco Gaido commented on SPARK-23439:
-------------------------------------
[~cloud_fan] I think this comes from https://github.com/apache/spark/pull/8215 (https://github.com/apache/spark/blob/1dc2c1d5e85c5f404f470aeb44c1f3c22786bdea/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala#L203). We are adding an Alias to the name of the last extracted value. I am not sure whether this is the right behavior, so this JIRA is invalid, or this should be changed. What do you think? Thanks.
> Ambiguous reference when selecting column inside StructType with same name that outer colum
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-23439
> URL: https://issues.apache.org/jira/browse/SPARK-23439
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Environment: Scala 2.11.8, Spark 2.2.0
> Reporter: Alejandro Trujillo Caballero
> Priority: Minor
>
> Hi.
> I've seen that when working with nested struct fields in a DataFrame and doing a select operation the nesting is lost and this can result in collisions between column names.
> For example:
>
> {code:java}
> case class Foo(a: Int, b: Bar)
> case class Bar(a: Int)
> val items = List(
> Foo(1, Bar(1)),
> Foo(2, Bar(2))
> )
> val df = spark.createDataFrame(items)
> val df_a_a = df.select($"a", $"b.a").show
> //+---+---+
> //| a| a|
> //+---+---+
> //| 1| 1|
> //| 2| 2|
> //+---+---+
> df.select($"a", $"b.a").printSchema
> //root
> //|-- a: integer (nullable = false)
> //|-- a: integer (nullable = true)
> df.select($"a", $"b.a").select($"a")
> //org.apache.spark.sql.AnalysisException: Reference 'a' is ambiguous, could be: a#9, a#{code}
>
>
> Shouldn't the second column be named "b.a"?
>
> Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org