You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/06/07 15:00:01 UTC

[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

    [ https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858724#comment-16858724 ] 

Joris Van den Bossche commented on ARROW-1989:
----------------------------------------------

Looking into this. But, I can't find a reproducible example which gives a similar error to what is reported above. Does somebody have a concrete example?

With latest pandas and pyarrow (and the same with pd 0.24.2 / pyarrow 0.12), I can get to something like this (having an timestamp with lower resolution that is out of bounds for pandas):

{code:python}
In [63]: a = pa.array([datetime.datetime(1018, 12, 12)], type=pa.timestamp('s'))

In [64]: a.to_pandas()
Out[64]: array(['1018-12-12T00:00:00'], dtype='datetime64[s]')

In [65]: table = pa.Table.from_pydict({'a': a})

In [66]: table
Out[66]: 
pyarrow.Table
a: timestamp[s]

In [67]: table.to_pandas()
Out[67]: 
                              a
0 2188-01-19 23:09:07.419103232
{code}

This is a wrong result, however, and silently. This is a bug in pandas, and described in https://issues.apache.org/jira/browse/ARROW-3176

> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
>                 Key: ARROW-1989
>                 URL: https://issues.apache.org/jira/browse/ARROW-1989
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.14.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that they have dates that are larger than Pandas can represent with their nanosecond representation. Currently they simply see an Arrow exception and think that this problem is caused by Arrow. We should try to change the error from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX. This conversion is needed as Pandas does only support nanosecond timestamps. Your data is likely out of the range that can be represented with nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)