You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2019/07/05 20:52:27 UTC

Revisiting Python / pandas UDF

Hi all,

In the past two years, the pandas UDFs are perhaps the most important changes to Spark for Python data science. However, these functionalities have evolved organically, leading to some inconsistencies and confusions among users. I created a ticket and a document summarizing the issues, and a concrete proposal to fix them (the changes are pretty small). Thanks Xiangrui for initially bringing this to my attention, and Li Jin, Hyukjin, for offline discussions.

Please take a look: 

https://issues.apache.org/jira/browse/SPARK-28264

https://docs.google.com/document/u/1/d/10Pkl-rqygGao2xQf6sddt0b-4FYK4g8qr_bXLKTL65A/edit