You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Daniel Davies (Jira)" <ji...@apache.org> on 2021/12/30 14:35:00 UTC

[jira] [Updated] (SPARK-37788) ColumnOrName vs Column in PySpark Functions module

     [ https://issues.apache.org/jira/browse/SPARK-37788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Davies updated SPARK-37788:
----------------------------------
    Description: 
PySpark has mainly migrated to supporting both Column input types as well as string names of columns ("ColumnOrName") in it's functions module. There seem to be a small number of functions that need updating; either on conversions of input string names representing columns into the Column type, or simple annotation changes that indicate the function supports column string names.

Below are the functions I've seen:
 * F.overlay: Annotation only
 * F.least: Annotation only
 * F.slice: Needs a conversion
 * F.array_repeat: Needs a conversion

See here for additional context: [https://github.com/apache/spark/pull/35032#issuecomment-1003033776]

I'm happy to make a quick PR fixing these, if there is no reason for these functions being handled as a special case.

  was:
PySpark has mainly migrated to supporting both Column input types as well as string names of columns ("ColumnOrName") in it's functions module. There seem to be a small number of functions that need updating; either on conversions of input string names representing columns into the Column type, or simple annotation changes that indicate the function supports column string names.

Below are the functions I've seen:
 * F.overlay: Annotation only
 * F.least: Annotation only
 * F.slice: Needs a conversion
 * F.array_repeat: Needs a conversion

See here for additional context: [https://github.com/apache/spark/pull/35032#issuecomment-1003033776]

 

 


> ColumnOrName vs Column in PySpark Functions module
> --------------------------------------------------
>
>                 Key: SPARK-37788
>                 URL: https://issues.apache.org/jira/browse/SPARK-37788
>             Project: Spark
>          Issue Type: Question
>          Components: PySpark
>    Affects Versions: 3.2.0
>            Reporter: Daniel Davies
>            Priority: Minor
>
> PySpark has mainly migrated to supporting both Column input types as well as string names of columns ("ColumnOrName") in it's functions module. There seem to be a small number of functions that need updating; either on conversions of input string names representing columns into the Column type, or simple annotation changes that indicate the function supports column string names.
> Below are the functions I've seen:
>  * F.overlay: Annotation only
>  * F.least: Annotation only
>  * F.slice: Needs a conversion
>  * F.array_repeat: Needs a conversion
> See here for additional context: [https://github.com/apache/spark/pull/35032#issuecomment-1003033776]
> I'm happy to make a quick PR fixing these, if there is no reason for these functions being handled as a special case.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org