You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2022/10/24 11:15:00 UTC
[jira] [Updated] (SPARK-32082) Project Zen: Improving Python usability
[ https://issues.apache.org/jira/browse/SPARK-32082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-32082:
---------------------------------
Fix Version/s: 3.4.0
> Project Zen: Improving Python usability
> ---------------------------------------
>
> Key: SPARK-32082
> URL: https://issues.apache.org/jira/browse/SPARK-32082
> Project: Spark
> Issue Type: Epic
> Components: PySpark
> Affects Versions: 3.1.0
> Reporter: Hyukjin Kwon
> Assignee: Hyukjin Kwon
> Priority: Critical
> Fix For: 3.4.0
>
>
> The importance of Python and PySpark has grown radically in the last few years. The number of PySpark downloads reached [more than 1.3 million _every week_|https://pypistats.org/packages/pyspark] when we count them _only_ in PyPI. Nevertheless, PySpark is still less Pythonic. It exposes many JVM error messages as an example, and the API documentation is poorly written.
> This epic tickets aims to improve the usability in PySpark, and make it more Pythonic. To be more explicit, this JIRA targets four bullet points below. Each includes examples:
> * Being Pythonic
> ** Pandas UDF enhancements and type hints
> ** Avoid dynamic function definitions, for example, at {{funcitons.py}} which makes IDEs unable to detect.
> * Better and easier usability in PySpark
> ** User-facing error message and warnings
> ** Documentation
> ** User guide
> ** Better examples and API documentation, e.g. [Koalas|https://koalas.readthedocs.io/en/latest/] and [pandas|https://pandas.pydata.org/docs/]
> * Better interoperability with other Python libraries
> ** Visualization and plotting
> ** Potentially better interface by leveraging Arrow
> ** Compatibility with other libraries such as NumPy universal functions or pandas possibly by leveraging Koalas
> * PyPI Installation
> ** PySpark with Hadoop 3 support on PyPi
> ** Better error handling
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org