You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marc de Lignie (Jira)" <ji...@apache.org> on 2020/12/31 10:46:00 UTC
[jira] [Created] (SPARK-33952) Python-friendly dtypes for pyspark
dataframes
Marc de Lignie created SPARK-33952:
--------------------------------------
Summary: Python-friendly dtypes for pyspark dataframes
Key: SPARK-33952
URL: https://issues.apache.org/jira/browse/SPARK-33952
Project: Spark
Issue Type: Task
Components: PySpark
Affects Versions: 3.0.1
Reporter: Marc de Lignie
The pyspark.sql.DataFrame.dtypes attribute contains string representations of the column datatypes in terms of JVM datatypes. However, for a python user it is a significant mental step to translate these to the corresponding python types encountered in UDF's and collected dataframes. This holds in particular for nested composite datatypes (array, map and struct). It is proposed to provide python-friendly dtypes in pyspark (as an addition, not a replacement) in which array<>, map<> and struct<> are translated to [], {} and Row().
Sample code, including tests, is available as [gist on github|https://gist.github.com/vtslab/81ded1a7af006100e00bf2a4a70a8147]. More explanation is provided at: [https://yaaics.blogspot.com/2020/12/python-friendly-dtypes-for-pyspark.html]
If this proposal finds sufficient support, I can provide a PR.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org