You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/06/15 07:45:07 UTC

[GitHub] [arrow] jorisvandenbossche opened a new issue, #36084: [Python] Converting date32/64 to pandas using nanoseconds can silently overflow

jorisvandenbossche opened a new issue, #36084:
URL: https://github.com/apache/arrow/issues/36084

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   If you specify to convert a date32 or date64 field to numpy/pandas datetime64 (i.e. not datetime.date objects) using `date_as_object=False`, and your date is out of bounds for the target resolution (at the moment nanoseconds, but with https://github.com/apache/arrow/pull/35656 and recent pandas versions, this will become milliseconds), you silently get mangled values:
   
   ```
   >>> pa.array([datetime.date(2400, 1, 1)]).to_pandas(date_as_object=False)
   0   1815-06-13 00:25:26.290448384
   dtype: datetime64[ns]
   ```
   
   This is because we currently simply multiple the values to get nanoseconds, without bounds / overflow checking:
   
   https://github.com/apache/arrow/blob/b4ac585ecb4da610cc64e346e564ca86594aec53/python/pyarrow/src/arrow/python/arrow_to_pandas.cc#L1592-L1594
   
   We could maybe use a cast instead? (which already has proper bounds checking):
   
   ```
   >>> pa.array([datetime.date(2400, 1, 1)]).cast(pa.timestamp("ns"))
   ...
   ArrowInvalid: Casting from date32[day] to timestamp[ns] would result in out of bounds timestamp: 157054
   ```
   
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org