You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2019/07/08 02:06:00 UTC

[jira] [Commented] (SPARK-28264) Revisiting Python / pandas UDF

    [ https://issues.apache.org/jira/browse/SPARK-28264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16879980#comment-16879980 ] 

Sean Owen commented on SPARK-28264:
-----------------------------------

I generally like the rationalization of the various UDF types, as they do different things, things which aren't so obvious from the names. Anything we can do to clarify is a win.

> Revisiting Python / pandas UDF
> ------------------------------
>
>                 Key: SPARK-28264
>                 URL: https://issues.apache.org/jira/browse/SPARK-28264
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>    Affects Versions: 3.0.0
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Major
>
> In the past two years, the pandas UDFs are perhaps the most important changes to Spark for Python data science. However, these functionalities have evolved organically, leading to some inconsistencies and confusions among users. This document revisits UDF definition and naming, as a result of discussions among Xiangrui, Li Jin, Hyukjin, and Reynold.
>  
> See document here: [https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit#|https://docs.google.com/document/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org