You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Maciej Szymkiewicz (Jira)" <ji...@apache.org> on 2022/01/05 19:13:00 UTC

[jira] [Assigned] (SPARK-37788) ColumnOrName vs Column in PySpark Functions module

     [ https://issues.apache.org/jira/browse/SPARK-37788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maciej Szymkiewicz reassigned SPARK-37788:
------------------------------------------

    Assignee: Daniel Davies

> ColumnOrName vs Column in PySpark Functions module
> --------------------------------------------------
>
>                 Key: SPARK-37788
>                 URL: https://issues.apache.org/jira/browse/SPARK-37788
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Daniel Davies
>            Assignee: Daniel Davies
>            Priority: Minor
>
> PySpark has mainly migrated to supporting both Column input types as well as string names of columns ("ColumnOrName") in it's functions module. There seem to be a small number of functions that need updating; either on conversions of input string names representing columns into the Column type, or simple annotation changes that indicate the function supports column string names.
> Below are the functions I've seen:
>  * F.overlay: Annotation only
>  * F.least: Annotation only
>  * F.slice: Needs a conversion
>  * F.array_repeat: Needs a conversion
> See here for additional context: [https://github.com/apache/spark/pull/35032#issuecomment-1003033776]
> I'm happy to make a quick PR fixing these, if there is no reason for these functions being handled as a special case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org