You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2021/03/01 18:33:00 UTC

[jira] [Resolved] (SPARK-34560) Cannot join datasets of SHOW TABLES

     [ https://issues.apache.org/jira/browse/SPARK-34560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-34560.
---------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

Issue resolved by pull request 31675
[https://github.com/apache/spark/pull/31675]

> Cannot join datasets of SHOW TABLES
> -----------------------------------
>
>                 Key: SPARK-34560
>                 URL: https://issues.apache.org/jira/browse/SPARK-34560
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.2.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Major
>             Fix For: 3.2.0
>
>
> The example portraits the issue:
> {code:scala}
> scala> sql("CREATE NAMESPACE ns1")
> res8: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE NAMESPACE ns2")
> res9: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns1.tbl1 (c INT)")
> res10: org.apache.spark.sql.DataFrame = []
> scala> sql("CREATE TABLE ns2.tbl2 (c INT)")
> res11: org.apache.spark.sql.DataFrame = []
> scala> val show1 = sql("SHOW TABLES IN ns1")
> show1: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string ... 1 more field]
> scala> val show2 = sql("SHOW TABLES IN ns2")
> show2: org.apache.spark.sql.DataFrame = [namespace: string, tableName: string ... 1 more field]
> scala> show1.show
> +---------+---------+-----------+
> |namespace|tableName|isTemporary|
> +---------+---------+-----------+
> |      ns1|     tbl1|      false|
> +---------+---------+-----------+
> scala> show2.show
> +---------+---------+-----------+
> |namespace|tableName|isTemporary|
> +---------+---------+-----------+
> |      ns2|     tbl2|      false|
> +---------+---------+-----------+
> scala> show1.join(show2).where(show1("tableName") =!= show2("tableName")).show
> org.apache.spark.sql.AnalysisException: Column tableName#17 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. Please alias the Datasets with different names via `Dataset.as` before joining them, and specify the column using qualified name, e.g. `df.as("a").join(df.as("b"), $"a.id" > $"b.id")`. You can also set spark.sql.analyzer.failAmbiguousSelfJoin to false to disable this check.
>   at org.apache.spark.sql.execution.analysis.DetectAmbiguousSelfJoin$.apply(DetectAmbiguousSelfJoin.scala:157)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org