You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jakub Nowacki (JIRA)" <ji...@apache.org> on 2017/10/06 08:06:02 UTC

[jira] [Created] (SPARK-22212) Some SQL functions in Python fail with string column name

Jakub Nowacki created SPARK-22212:
-------------------------------------

             Summary: Some SQL functions in Python fail with string column name 
                 Key: SPARK-22212
                 URL: https://issues.apache.org/jira/browse/SPARK-22212
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 2.2.0
            Reporter: Jakub Nowacki
            Priority: Minor


Most of the functions in {{pyspark.sql.functions}} allow usage of both column name string and {{Column}} object. But there are some functions, like {{trim}}, that require to pass only {{Column}}. See below code for explanation.

{code}
>>> import pyspark.sql.functions as func
>>> df = spark.createDataFrame([tuple(l) for l in "abcde"], ["text"])
>>> df.select(func.trim(df["text"])).show()
+----------+
|trim(text)|
+----------+
|         a|
|         b|
|         c|
|         d|
|         e|
+----------+
>>> df.select(func.trim("text")).show()
[...]
Py4JError: An error occurred while calling z:org.apache.spark.sql.functions.trim. Trace:
py4j.Py4JException: Method trim([class java.lang.String]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:339)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Thread.java:748)
{code}

This is because most of the Python function calls map column name to Column in the Python function mapping, but functions created via {{_create_function}} pass them as is, if they are not {{Column}}.

I am preparing PR with the proposed fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org