You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruifeng Zheng (Jira)" <ji...@apache.org> on 2023/02/15 02:47:00 UTC

[jira] [Created] (SPARK-42444) DataFrame.drop should handle multi columns properly

Ruifeng Zheng created SPARK-42444:
-------------------------------------

             Summary: DataFrame.drop should handle multi columns properly
                 Key: SPARK-42444
                 URL: https://issues.apache.org/jira/browse/SPARK-42444
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.4.0
            Reporter: Ruifeng Zheng



{code:java}
from pyspark.sql import Row
df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
{code}

This works in 3.3.0

{code:java}
+------+
|height|
+------+
|    85|
|    80|
+------+
{code}

but fails in 3.4


{code:java}
---------------------------------------------------------------------------
AnalysisException                         Traceback (most recent call last)
Cell In[1], line 4
      2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"])
      3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, name="Bob")])
----> 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()

File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in DataFrame.drop(self, *cols)
   4911     jcols = [_to_java_column(c) for c in cols]
   4912     first_column, *remaining_columns = jcols
-> 4913     jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
   4915 return DataFrame(jdf, self.sparkSession)

File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in capture_sql_exception.<locals>.deco(*a, **kw)
    155 converted = convert_exception(e.java_exception)
    156 if not isinstance(converted, UnknownException):
    157     # Hide where the exception came from that shows a non-Pythonic
    158     # JVM exception message.
--> 159     raise converted from None
    160 else:
    161     raise

AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, `name`].

{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org