You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (Jira)" <ji...@apache.org> on 2020/07/12 22:43:00 UTC
[jira] [Created] (SPARK-32285) Add PySpark support for nested
timestamps with arrow
Bryan Cutler created SPARK-32285:
------------------------------------
Summary: Add PySpark support for nested timestamps with arrow
Key: SPARK-32285
URL: https://issues.apache.org/jira/browse/SPARK-32285
Project: Spark
Issue Type: Sub-task
Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Bryan Cutler
Currently with arrow optimizations, there is post-processing done in pandas for timestamp columns to localize timezone. This is not done for nested columns with timestamps such as StructType or ArrayType.
Adding support for this is needed for Apache Arrow 1.0.0 upgrade due to use of structs with timestamps in groupedby key over a window.
As a simple first step, timestamps with 1 level nesting could be done first and this will satisfy the immediate need.
NOTE: with Arrow 1.0.0, it might be possible to do the timezone processing with pyarrow.array.cast, which could be easier done than in pandas.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org