You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2019/09/06 06:26:00 UTC

[jira] [Updated] (ARROW-5125) [Python] Cannot roundtrip extreme dates through pyarrow

     [ https://issues.apache.org/jira/browse/ARROW-5125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-5125:
----------------------------------
    Labels: parquet pull-request-available windows  (was: parquet windows)

> [Python] Cannot roundtrip extreme dates through pyarrow
> -------------------------------------------------------
>
>                 Key: ARROW-5125
>                 URL: https://issues.apache.org/jira/browse/ARROW-5125
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.13.0
>         Environment: Windows 10, Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05)
>            Reporter: Max Bolingbroke
>            Assignee: Micah Kornfield
>            Priority: Major
>              Labels: parquet, pull-request-available, windows
>             Fix For: 0.15.0
>
>
> You can roundtrip many dates through a pyarrow array:
>  
> {noformat}
> >>> pa.array([datetime.date(1980, 1, 1)], type=pa.date32())[0]
> datetime.date(1980, 1, 1){noformat}
>  
> But (on Windows at least), not extreme ones:
>  
> {noformat}
> >>> pa.array([datetime.date(1960, 1, 1)], type=pa.date32())[0]
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__
>  File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py
> OSError: [Errno 22] Invalid argument
> >>> pa.array([datetime.date(3200, 1, 1)], type=pa.date32())[0]
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "pyarrow\scalar.pxi", line 74, in pyarrow.lib.ArrayValue.__repr__
>  File "pyarrow\scalar.pxi", line 226, in pyarrow.lib.Date32Value.as_py
> {noformat}
> This is because datetime.utcfromtimestamp and datetime.timestamp fail on these dates, but it seems we should be able to totally avoid invoking this function when deserializing dates. Ideally we would be able to roundtrip these as datetimes too, of course, but it's less clear that this will be easy. For some context on this see [https://bugs.python.org/issue29097].
> This may be related to ARROW-3176 and ARROW-4746



--
This message was sent by Atlassian Jira
(v8.3.2#803003)