You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2020/12/23 20:08:00 UTC

[jira] [Commented] (SPARK-33871) Cannot access to column after left semi join and left join

    [ https://issues.apache.org/jira/browse/SPARK-33871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254224#comment-17254224 ] 

L. C. Hsieh commented on SPARK-33871:
-------------------------------------

For self-join, Spark will add alias to ambiguous columns in the join query. But semiJoin as a query, the column col is still referred to df.col. So left.select(semiJoin(col)), left.select(df(col)) are basically selecting same column.

If you want to access the column col of the semi join in the left join, a work around is to put a relation alias and access col on top of this relation alias.

{code}
scala> val semiJoin = df.join(df2, df(col) === df2(col), "left_semi").as("left_semi")
scala> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
scala> left.select("left_semi.c1").show

+----+
|  c1|
+----+
|   1|
|null|
|null|
|null|
+----+

{code}

> Cannot access to column after left semi join  and left join
> -----------------------------------------------------------
>
>                 Key: SPARK-33871
>                 URL: https://issues.apache.org/jira/browse/SPARK-33871
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Evgenii Samusenko
>            Priority: Minor
>
> Cannot access to column after left semi join and left join
> {code}
> val col = "c1"
> val df = Seq((1, "a"),(2, "a"),(3, "a"),(4, "a")).toDF(col, "c2")
> val df2 = Seq(1).toDF(col)
> val semiJoin = df.join(df2, df(col) === df2(col), "left_semi")
> val left = df.join(semiJoin, df(col) === semiJoin(col), "left")
> left.show
> +---+---+----+----+
> | c1| c2|  c1|  c2|
> +---+---+----+----+
> |  1|  a|   1|   a|
> |  2|  a|null|null|
> |  3|  a|null|null|
> |  4|  a|null|null|
> +---+---+----+----+
> left.select(semiJoin(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> left.select(df(col))
> +---+
> | c1|
> +---+
> |  1|
> |  2|
> |  3|
> |  4|
> +---+
> {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org