You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Micah Kornfield (JIRA)" <ji...@apache.org> on 2019/01/31 06:48:00 UTC

[jira] [Commented] (ARROW-1425) [Python] Document semantic differences between Spark timestamps and Arrow timestamps

    [ https://issues.apache.org/jira/browse/ARROW-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756928#comment-16756928 ] 

Micah Kornfield commented on ARROW-1425:
----------------------------------------

[~icexelloss] I pushed a new PR for this so if you don't mind, I will try to finish it up.

> [Python] Document semantic differences between Spark timestamps and Arrow timestamps
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-1425
>                 URL: https://issues.apache.org/jira/browse/ARROW-1425
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Assignee: Micah Kornfield
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> The way that Spark treats non-timezone-aware timestamps as session local can be problematic when using pyarrow which may view the data coming from toPandas() as time zone naive (but with fields as though it were UTC, not session local). We should document carefully how to properly handle the data coming from Spark to avoid problems.
> cc [~bryanc] [~holdenkarau]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)