You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/10/10 04:56:00 UTC
[jira] [Comment Edited] (SPARK-32082) Project Zen: Improving Python usability

    [ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211560#comment-17211560 ] 

Hyukjin Kwon edited comment on SPARK-32082 at 10/10/20, 4:55 AM:
-----------------------------------------------------------------

Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email to dev mailing list (with cc'ing me) or file a JIRA if you have a concrete idea.


was (Author: hyukjin.kwon):
Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email (with cc'ing me) or file a JIRA if you have a concrete idea.

> Project Zen: Improving Python usability
> ---------------------------------------
>
>                 Key: SPARK-32082
>                 URL: https://issues.apache.org/jira/browse/SPARK-32082
>             Project: Spark
>          Issue Type: Epic
>          Components: PySpark
>    Affects Versions: 3.1.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Critical
>
> The importance of Python and PySpark has grown radically in the last few years. The number of PySpark downloads reached [more than 1.3 million _every week_|https://pypistats.org/packages/pyspark] when we count them _only_ in PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error messages as an example, and the API documentation is poorly written.
> This epic tickets aims to improve the usability in PySpark, and make it more Pythonic. To be more explicit, this JIRA targets four bullet points below. Each includes examples:
>  * Being Pythonic
>  ** Pandas UDF enhancements and type hints
>  ** Avoid dynamic function definitions, for example, at {{funcitons.py}} which makes IDEs unable to detect.
>  * Better and easier usability in PySpark
>  ** User-facing error message and warnings
>  ** Documentation
>  ** User guide
>  ** Better examples and API documentation, e.g. [Koalas|https://koalas.readthedocs.io/en/latest/] and [pandas|https://pandas.pydata.org/docs/]
>  * Better interoperability with other Python libraries
>  ** Visualization and plotting
>  ** Potentially better interface by leveraging Arrow
>  ** Compatibility with other libraries such as NumPy universal functions or pandas possibly by leveraging Koalas
>  * PyPI Installation
>  ** PySpark with Hadoop 3 support on PyPi
>  ** Better error handling



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org