You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/07/23 00:47:00 UTC

[jira] [Resolved] (SPARK-32347) BROADCAST hint makes a weird message that "column can't be resolved" (it was OK in Spark 2.4)

     [ https://issues.apache.org/jira/browse/SPARK-32347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-32347.
-----------------------------------
    Resolution: Duplicate

> BROADCAST hint makes a weird message that "column can't be resolved" (it was OK in Spark 2.4)
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32347
>                 URL: https://issues.apache.org/jira/browse/SPARK-32347
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0
>         Environment: Spark 3.0.0, jupyter notebook, spark launched in local[4] mode, but with Standalone cluster it also fails the same way.
>  
>  
>            Reporter: Ihor Bobak
>            Priority: Major
>         Attachments: 2020-07-17 17_46_32-Window.png, 2020-07-17 17_49_27-Window.png, 2020-07-17 17_52_51-Window.png
>
>
> The bug is very easily reproduced: run the following same code in Spark 2.4.3. and in 3.0.0.
> The SQL parser will raise an invalid error message with 3.0.0, although everything seems to be OK with the SQL statement and it works fine in Spark 2.4.3
> {code:python}
> import pandas as pd
> pdf_sales = pd.DataFrame([(1, 10), (2, 20)], columns=["BuyerID", "Qty"])
> pdf_buyers = pd.DataFrame([(1, "John"), (2, "Jack")], columns=["BuyerID", "BuyerName"])
> df_sales = spark.createDataFrame(pdf_sales)
> df_buyers = spark.createDataFrame(pdf_buyers)
> df_sales.createOrReplaceTempView("df_sales")
> df_buyers.createOrReplaceTempView("df_buyers")
> spark.sql("""
>     with b as (
>         select /*+ BROADCAST(df_buyers) */
>             BuyerID, BuyerName 
>         from df_buyers
>     )
>     select 
>         b.BuyerID,
>         b.BuyerName,
>         s.Qty
>     from df_sales s
>         inner join b on s.BuyerID =  b.BuyerID
> """).toPandas()
> {code}
> The (wrong) error message:
> ---------------------------------------------------------------------------
> AnalysisException                         Traceback (most recent call last)
> <ipython-input-4-8dfe318a59ee> in <module>
>      22     from df_sales s
>      23         inner join b on s.BuyerID =  b.BuyerID
> ---> 24 """).toPandas()
> /opt/spark-3.0.0-bin-without-hadoop/python/pyspark/sql/session.py in sql(self, sqlQuery)
>     644         [Row(f1=1, f2=u'row1'), Row(f1=2, f2=u'row2'), Row(f1=3, f2=u'row3')]
>     645         """
> --> 646         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
>     647 
>     648     @since(2.0)
> /opt/spark-3.0.0-bin-without-hadoop/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
>    1303         answer = self.gateway_client.send_command(command)
>    1304         return_value = get_return_value(
> -> 1305             answer, self.gateway_client, self.target_id, self.name)
>    1306 
>    1307         for temp_arg in temp_args:
> /opt/spark-3.0.0-bin-without-hadoop/python/pyspark/sql/utils.py in deco(*a, **kw)
>     135                 # Hide where the exception came from that shows a non-Pythonic
>     136                 # JVM exception message.
> --> 137                 raise_from(converted)
>     138             else:
>     139                 raise
> /opt/spark-3.0.0-bin-without-hadoop/python/pyspark/sql/utils.py in raise_from(e)
> AnalysisException: cannot resolve '`s.BuyerID`' given input columns: [s.BuyerID, b.BuyerID, b.BuyerName, s.Qty]; line 12 pos 24;
> 'Project ['b.BuyerID, 'b.BuyerName, 's.Qty]
> +- 'Join Inner, ('s.BuyerID = 'b.BuyerID)
>    :- SubqueryAlias s
>    :  +- SubqueryAlias df_sales
>    :     +- LogicalRDD [BuyerID#23L, Qty#24L], false
>    +- SubqueryAlias b
>       +- Project [BuyerID#27L, BuyerName#28]
>          +- SubqueryAlias df_buyers
>             +- LogicalRDD [BuyerID#27L, BuyerName#28], false



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org