You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/10/10 04:56:00 UTC
[jira] [Comment Edited] (SPARK-32082) Project Zen: Improving Python
usability
[ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17211560#comment-17211560 ]
Hyukjin Kwon edited comment on SPARK-32082 at 10/10/20, 4:55 AM:
-----------------------------------------------------------------
Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email to dev mailing list (with cc'ing me) or file a JIRA if you have a concrete idea.
was (Author: hyukjin.kwon):
Nope, these are all the JIRAs linked here. I should still collect feedback and investigate with a proper design for that. Feel free to send an email (with cc'ing me) or file a JIRA if you have a concrete idea.
> Project Zen: Improving Python usability
> ---------------------------------------
>
> Key: SPARK-32082
> URL: https://issues.apache.org/jira/browse/SPARK-32082
> Project: Spark
> Issue Type: Epic
> Components: PySpark
> Affects Versions: 3.1.0
> Reporter: Hyukjin Kwon
> Assignee: Hyukjin Kwon
> Priority: Critical
>
> The importance of Python and PySpark has grown radically in the last few years. The number of PySpark downloads reached [more than 1.3 million _every week_|https://pypistats.org/packages/pyspark] when we count them _only_ in PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error messages as an example, and the API documentation is poorly written.
> This epic tickets aims to improve the usability in PySpark, and make it more Pythonic. To be more explicit, this JIRA targets four bullet points below. Each includes examples:
> * Being Pythonic
> ** Pandas UDF enhancements and type hints
> ** Avoid dynamic function definitions, for example, at {{funcitons.py}} which makes IDEs unable to detect.
> * Better and easier usability in PySpark
> ** User-facing error message and warnings
> ** Documentation
> ** User guide
> ** Better examples and API documentation, e.g. [Koalas|https://koalas.readthedocs.io/en/latest/] and [pandas|https://pandas.pydata.org/docs/]
> * Better interoperability with other Python libraries
> ** Visualization and plotting
> ** Potentially better interface by leveraging Arrow
> ** Compatibility with other libraries such as NumPy universal functions or pandas possibly by leveraging Koalas
> * PyPI Installation
> ** PySpark with Hadoop 3 support on PyPi
> ** Better error handling
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org