You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2020/07/23 02:00:12 UTC

[jira] [Assigned] (SPARK-32280) AnalysisException thrown when query contains several JOINs

     [ https://issues.apache.org/jira/browse/SPARK-32280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-32280:
------------------------------------

    Assignee:     (was: Apache Spark)

> AnalysisException thrown when query contains several JOINs
> ----------------------------------------------------------
>
>                 Key: SPARK-32280
>                 URL: https://issues.apache.org/jira/browse/SPARK-32280
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.5
>            Reporter: David Lindelöf
>            Priority: Major
>
> I've come across a curious {{AnalysisException}} thrown in one of my SQL queries, even though the SQL appears legitimate. I was able to reduce it to this example:
> {code:python}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.getOrCreate()
> spark.sql('SELECT 1 AS id').createOrReplaceTempView('A')
> spark.sql('''
>  SELECT id,
>  'foo' AS kind
>  FROM A''').createOrReplaceTempView('B')
> spark.sql('''
>  SELECT l.id
>  FROM B AS l
>  JOIN B AS r
>  ON l.kind = r.kind''').createOrReplaceTempView('C')
> spark.sql('''
>  SELECT 0
>  FROM (
>    SELECT *
>    FROM B
>    JOIN C
>    USING (id))
>  JOIN (
>    SELECT *
>    FROM B
>    JOIN C
>    USING (id))
>  USING (id)''')
> {code}
> Running this yields the following error:
> {code}
>  py4j.protocol.Py4JJavaError: An error occurred while calling o20.sql.
> : org.apache.spark.sql.AnalysisException: Resolved attribute(s) kind#11 missing from id#10,kind#2,id#7,kind#5 in operator !Join Inner, (kind#11 = kind#5). Attribute(s) with the same name appear in the operation: kind. Please check if the right attribute(s) are used.;;
> Project [0 AS 0#15]
> +- Project [id#0, kind#2, kind#11]
>    +- Join Inner, (id#0 = id#14)
>       :- SubqueryAlias `__auto_generated_subquery_name`
>       :  +- Project [id#0, kind#2]
>       :     +- Project [id#0, kind#2]
>       :        +- Join Inner, (id#0 = id#9)
>       :           :- SubqueryAlias `b`
>       :           :  +- Project [id#0, foo AS kind#2]
>       :           :     +- SubqueryAlias `a`
>       :           :        +- Project [1 AS id#0]
>       :           :           +- OneRowRelation
>       :           +- SubqueryAlias `c`
>       :              +- Project [id#9]
>       :                 +- Join Inner, (kind#2 = kind#5)
>       :                    :- SubqueryAlias `l`
>       :                    :  +- SubqueryAlias `b`
>       :                    :     +- Project [id#9, foo AS kind#2]
>       :                    :        +- SubqueryAlias `a`
>       :                    :           +- Project [1 AS id#9]
>       :                    :              +- OneRowRelation
>       :                    +- SubqueryAlias `r`
>       :                       +- SubqueryAlias `b`
>       :                          +- Project [id#7, foo AS kind#5]
>       :                             +- SubqueryAlias `a`
>       :                                +- Project [1 AS id#7]
>       :                                   +- OneRowRelation
>       +- SubqueryAlias `__auto_generated_subquery_name`
>          +- Project [id#14, kind#11]
>             +- Project [id#14, kind#11]
>                +- Join Inner, (id#14 = id#10)
>                   :- SubqueryAlias `b`
>                   :  +- Project [id#14, foo AS kind#11]
>                   :     +- SubqueryAlias `a`
>                   :        +- Project [1 AS id#14]
>                   :           +- OneRowRelation
>                   +- SubqueryAlias `c`
>                      +- Project [id#10]
>                         +- !Join Inner, (kind#11 = kind#5)
>                            :- SubqueryAlias `l`
>                            :  +- SubqueryAlias `b`
>                            :     +- Project [id#10, foo AS kind#2]
>                            :        +- SubqueryAlias `a`
>                            :           +- Project [1 AS id#10]
>                            :              +- OneRowRelation
>                            +- SubqueryAlias `r`
>                               +- SubqueryAlias `b`
>                                  +- Project [id#7, foo AS kind#5]
>                                     +- SubqueryAlias `a`
>                                        +- Project [1 AS id#7]
>                                           +- OneRowRelation
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:43)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:95)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:369)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:86)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:125)
> 	at scala.collection.immutable.List.foreach(List.scala:392)
> 	at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125)
> 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:86)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:95)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:108)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:105)
> 	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:201)
> 	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:105)
> 	at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:58)
> 	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:56)
> 	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:48)
> 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:78)
> 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> 	at py4j.Gateway.invoke(Gateway.java:282)
> 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
> 	at py4j.GatewayConnection.run(GatewayConnection.java:238)
> 	at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org