You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Krisztian Szucs (JIRA)" <ji...@apache.org> on 2018/09/09 13:26:00 UTC

[jira] [Commented] (ARROW-1989) [Python] Better UX on timestamp conversion to Pandas

    [ https://issues.apache.org/jira/browse/ARROW-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608447#comment-16608447 ] 

Krisztian Szucs commented on ARROW-1989:
----------------------------------------

{code:python}
In [45]: pa.array([datetime.date(2018, 12, 12)], type=pa.timestamp('s'))
---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
<ipython-input-45-f6eb2418d6b7> in <module>()
----> 1 pa.array([datetime.date(2018, 12, 12)], type=pa.timestamp('s'))

~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib.array()
    169     else:
    170         # ConvertPySequence does strict conversion if type is explicitly passed
--> 171         return _sequence_to_array(obj, mask, size, type, pool, from_pandas)
    172
    173

~/Workspace/arrow/python/pyarrow/array.pxi in pyarrow.lib._sequence_to_array()
     33     cdef shared_ptr[CChunkedArray] out
     34     with nogil:
---> 35         check_status(ConvertPySequence(sequence, mask, options, &out))
     36
     37     if out.get().num_chunks() == 1:

~/Workspace/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status()
     89             raise ArrowNotImplementedError(message)
     90         elif status.IsTypeError():
---> 91             raise ArrowTypeError(message)
     92         elif status.IsCapacityError():
     93             raise ArrowCapacityError(message)

ArrowTypeError: an integer is required (got type datetime.date)
 {code}

however with datetime it works

{code:python}
In [46]: pa.array([datetime.datetime(2018, 12, 12)], type=pa.timestamp('s'))
Out[46]:
<pyarrow.lib.TimestampArray object at 0x11d243638>
[
  1544572800
]
{code}

I think We should have a general solution to extend the low level errors with extra, python related context. 
The current error handling in cython seems really lightweight https://github.com/apache/arrow/blob/master/python/pyarrow/error.pxi#L71

Would it be OK to extend it with an error rewriting logic?

> [Python] Better UX on timestamp conversion to Pandas
> ----------------------------------------------------
>
>                 Key: ARROW-1989
>                 URL: https://issues.apache.org/jira/browse/ARROW-1989
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Uwe L. Korn
>            Priority: Major
>             Fix For: 0.11.0
>
>
> Converting timestamp columns to Pandas, users often have the problem that they have dates that are larger than Pandas can represent with their nanosecond representation. Currently they simply see an Arrow exception and think that this problem is caused by Arrow. We should try to change the error from
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX
> {code}
> to something along the lines of 
> {code}
> ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: XX. This conversion is needed as Pandas does only support nanosecond timestamps. Your data is likely out of the range that can be represented with nanosecond resolution.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)